I try to get all seperate sections of a wikipedia article through the api.
I know already :
Howto retrieve a complete text :
http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvlimit=1&titles=house&rvprop=content
Howto retrieve a specific section of the text:
http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvlimit=1&titles=house&rvprop=content&rvsection=0
Howto retrieve all sections seperately with one request ? (JSON Array for example)
What you ask is called parsing, because it requires interpretation of the wikitext source to split the page by sections etc. So the solution is given in https://www.mediawiki.org/wiki/API:Parsing_wikitext
1) Get the list of sections: https://www.mediawiki.org/w/api.php?action=parse&page=API:Parsing_wikitext&prop=sections
2) Ask the parsed wikitext of that section: https://www.mediawiki.org/w/api.php?action=parse&page=API:Parsing_wikitext§ion=1&prop=text
I realize this question was asked four years ago, so possibly the following was not available then:
You can use the REST API described here: https://www.mediawiki.org/wiki/REST_API
The REST endpoints are described/documented here: https://en.wikipedia.org/api/rest_v1/#/
The mobile-sections endpoint (intended for consuming info for a mobile device) gives you a nice breakdown with headings, which sounds like what you are asking for.
Alternatively, the metadata endpoint returns a toc (table of contents) section which contains the same breakdown of headings.
Here is an example URL, fetching the mobile sections for the "Egyptian pyramids" page:
https://en.wikipedia.org/api/rest_v1/page/mobile-sections/Egyptian_pyramids
The advantage is that the response is in JSON format (which is what you were asking for).
Related
I am trying to search by section with the Wikipedia API.
What I already know:
For the below:
https://en.wikipedia.org/w/api.php?format=xml&action=query&prop=revisions&titles=Game_of_Thrones_(season_1)&rvprop=content&rvsection=0
I know the rvsection=0 will give me section 0 of the Wikipedia page and I can change this to get different sections of the page eg. 1,2,3.
What I am wondering is how/if I can search via section name? Eg. In the link above on the Wikipedia page there is a section named "Episodes", how can I search for this, so I get all the content from this section.
If this is not possible, is there a work around for this? What I am wanting to do is get Episode information from different Wikipedia pages.
I have done some more researching into this and have sort of found a solution.
If we want to get a certain section then we need to query this information with the API below:
https://en.wikipedia.org/w/api.php?action=parse&format=json&page=(NAME_OF_WIKI_PAGE)&prop=sections&disabletoc=1
This will give us JSON info about names of each section.
Once we have section info, use the parse API to get the wikitext. If we want the HTML, we can change prop to text:
https://en.wikipedia.org/w/api.php?action=parse&format=json&page=(NAME_OF_WIKI)&prop=wikitext§ion=(SECTION #)&disabletoc=1
As a result, we get the specific section we want formatted in JSON. The next step for me is sorting this and trying to get this HTML/wiki text into plain text.
I'm looking at the API documentation here,
https://www.mediawiki.org/wiki/API:Query
Getting the wikitext for a page is mentioned in the beginning of the documentation,
The action=query module allows you to get information about a wiki and the data stored in it, such as the wikitext of a particular page, the links and categories of a set of pages, or the token you need to change wiki content.
but I cant seem to figure out what parameters to pass in the API request to return the wikitext for a given page. Anyone know how to do this?
I've tried parameters like,
{'action':'query', 'titles':'Anarchism', 'prop':'wikitext', 'format':'json'}
You must use this query .
https://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&format=json&titles=Anarchism&rvslots=main
I'm wondering what the easiest way to extract only the information contained withing a certain template would be using the wikimedia api.
I'd like to extract the information contained in the template "Template:Mycomorphbox" for this page: http://en.wikipedia.org/wiki/Amanita_phalloides
I'm a bit frustrated that it seems like I have to pull the entire content of the page to get the information that I need. Surely there has to be a better way.
Indeed there is a better way. You must not extract information from templates (or from wikitext in general). That's not your job nor your application's, it's MediaWiki's.
Use Wikidata, which is where the structured information from and for Wikipedia is stored. See the Wikibase API documentation and see some of the properties used for biology stuff or ask if something is unclear.
As you surely know web.archive.org lets you inspect the history of a domain, ie:http://web.archive.org/web/*/besttatoo.com
I also has an API: http://archive.org/help/json.php
I need to get data from the API but I can't get many info on how to use it, has anyone used it and can paste some examples of use?
This link provides details about the item LovingU on archive.org:
http://archive.org/details/LovingU&output=json
To create an API query to your liking, use this page:
https://archive.org/advancedsearch.php#raw
That page allows you to choose your output format: JSON, XML, HTML, CSV or RSS and also the parameters your want to see. You can limit the number of results, too.
I want to fetch the feeds which are posted by my iPhone app's users on to my twitter app.
I found this URL - http://search.twitter.com/search.json?q=serachKeyword&result_type=recent
This will return to us a dictionary result which contains an array of tweets that matches with the serachKeyword.
But i want to fetch only the feeds which are posted via my twitter app.
Is there any way to fetch only my twitter application feeds not all?
Can i filter the search results or something like it?
The Twitter API says yes, you can filter by source.
The Search Operators section says:
news source:tweet_button | containing "news" and entered via the Tweet Button
The Source section says:
• can only be combined with a keyword parameter. If you do not include a keyword you will receive an HTTP 403 error with the message: {"error":"You must enter a query."}.
• supports multi-word sources by using _ instead of spaces. For example, the source "Tweet Button" should be entered as source: tweet_button
So you must have a keyword parameter and if your client name has spaces in it replace them with underscores.
There is no complete method to do this. As #JoePasq answered you can search on source but you must include a keyword and Twitter search is filtered for quality and relevance.
Search is focused in relevance and not completeness. This means that some Tweets and users may be missing from search results.
The Search API is not complete index of all Tweets, but instead an index of recent Tweets.
What you should do is have your application post a request to a server you control every time a user posts a tweet with the status_id. This way you can store a complete database of all tweets posted from your app and query the data as needed.