Constrain Wikipedia Search API to generate only NS:0 pages - api

I am calling the Wikipedia API from Java using the following search query to generate redirected pages:
https://en.wikipedia.org//w/api.php?format=json&action=query&generator=allpages&gapfilterredir=redirects&prop=links&continue=&gapfrom=D
where the final 'D' is just an example for the continue-from.
I am interested in only iterating over items in namespace:0. In fact, if I don't, the continue return value includes category pages, which break the next query iteration.
Thank you in advance.

The parameter you need from the Allpages api is
…&gapnamespace=0&…
but notice that when you omit it, then 0 is the default anyway.

Related

Karate : Is it feasible to iterate Url to support pagination

I have a rest endpoint with pagination, I want to verify expected parameter exists by traversing through all pages. One option I could think of is to have an array defined with page offset intervals like {0,25,50....}
Welcoming better approaches. Is it feasible to break the loop when my expected condition is met?
ex: Given url 'http://myhost.com/v1/cats/'+'#(offset)'
And request {name : 'Billie'}
When method post
Then status 201
Note: have not tested the above code, looking for better approach.
You should be able to figure this out with conditional logic: https://github.com/intuit/karate#conditional-logic
There should be some part of your response that tells you whether there is a "next" page or not. There are ways you can "loop" manually, refer this part of the docs: https://github.com/intuit/karate#polling

How can I get page id, wikidata id of some title along with multiple languages in a single API call?

I have been trying to call Wikipedia API to retrieve page id and wikidata item id using below call and it works fine.
https://en.wikipedia.org/w/api.php?action=query&prop=pageprops&ppprop=wikibase_item&redirects=1&format=xml&titles=Cat
but I need to retrieve the same information from other languages of my choice for example if I mention German and French languages in my call, it should look for their translation of word Cat and retrieve their page info. There is langlink property in Wikipedia API but somehow it doesn't work with query action along with pageprop.
So ideally, I want something like this:
https://en.wikipedia.org/w/api.php?action=query&prop=pageprops&ppprop=wikibase_item&prop=langlinks&lllang=de&lllang=fr&titles=Cat
Any help would be appreciated.
Using lllang twice will just result in the second value overwriting the first one. You'll have to omit the paramter and then you get all the links:
https://en.wikipedia.org/w/api.php?action=query&prop=pageprops|langlinks&ppprop=wikibase_item&titles=Cat

How to get all Wikipedia page links with their pageIDs?

Starting a request like that:
https://en.wikipedia.org/w/api.php?action=query&format=json&titles=Title&prop=links&pllimit=500
provides me a list of links (that the page contains) where every link consists of the title and the ns (namespace)
Is there a way to also get the PageID together with title & ns? (the less work it is for the sever the better of course)
You need to use generator parameter. Here is an example for Cobra Wikipedia page.
https://en.wikipedia.org/w/api.php?action=query&generator=links&titles=Cobra&prop=info&gpllimit=500

querying categorymembers with wikimedia and the size

I try to get the page sizes of all category members through the wikimedia api with only one request.(or less then 10).
I know I would get the sizes of pages by:
(1) Requesting every page separately and get the size
or
(2) A search query like this:
http://en.wikipedia.org/w/api.php?action=query&list=search&srsearch=physics
The result is several pages with the size and word count property.
Now how can I get the size and word count for a category member with a query like this or with another trick ?
http://en.wikipedia.org/w/api.php?action=query&list=categorymembers&cmtitle=Category:Physics
Any hints shared would be appreciated.
You can use a category query as a generator, using generator=categorymember and gcmtitle=Category:Physics. This will execute the query action for each and every page in that category:
api.php?action=query
&generator=categorymembers
&gcmtitle=Category:Lakes
&prop=info
In the docs you can see what properties can be used as generators: categories, links and templates. Also, more or less every list module can be used as a generator in the same fashion.
Note that parameter names are prefixed with a g when used for a generator, so that cmtitle in the example above becomes gcmtitle, to distinguish them from parameters to the query action (that is applied to every page returned by the generator), prop and inprop, parameters

How to get the result of "all pages with prefix" using Wikipedia api?

I wish to use Wikipedia api to extract the result of this page:
http://en.wikipedia.org/wiki/Special:PrefixIndex
When searching "something" on it, for example this:
http://en.wikipedia.org/w/index.php?title=Special%3APrefixIndex&prefix=tal&namespace=4
Then, I would like to access each of the resulting pages and extract their information.
What api call might I use?
You can use list=allpages and specify apprefix. For example:
http://en.wikipedia.org/w/api.php?format=xml&action=query&list=allpages&apprefix=tal&aplimit=max
This query will give you the id and title of each article that starts with tal. If you want to get more information about each page, you can use this list as a generator:
http://en.wikipedia.org/w/api.php?format=xml&action=query&generator=allpages&gapprefix=tal&gaplimit=max&prop=info
You can give different values to the prop parameter to get different information about the page.