Wikipedia API get links to all images of preferred redirected articles in one API call? - wikipedia-api

I want to get the links to all the images for the default redirect pages for, say, "Hypertensive Disease" and "pulmonary edema." Right now, I am doing this through three api calls for each term. E.g,
http://en.wikipedia.org/w/api.php?action=query&titles=hypertensive_disease&redirects&prop=links&format=json&indexpageids
http://en.wikipedia.org/w/api.php?action=query&pageids=RESULT_FROM_1&prop=images&format=json
...api.php?action=query&titles=RESULTS_FROM_2&prop=imageinfo&&iiprop=url&iiurlwidth=220&format=json
Where 1 gets the IDs for the redirects, 2 gets the image names, and 3 gets the image URLs.
Is there a way to be nicer to wikipedia and do this with one API call?

It looks like you can make use of a generator, so the single API call to get all images from the defaulted redirected page for "hypertensive disease" ("Hypertension") would look like this:
http://en.wikipedia.org/w/api.php?action=query&titles=hypertensive_disease&redirects&generator=images&prop=imageinfo&&iiprop=url&format=json&indexpageids

Related

How to get image URLs in different pages using a single WIKI api call?

We can get text extract of different pages in a single wiki api call by using pipe character ( | ).
For eg : http://en.wikipedia.org/w/api.php?format=json&action=query&prop=extracts&exlimit=max&explaintext&exintro&titles=Yahoo|Google&redirects=
By using this api call, we can get datas about Google and Yahoo in text format. Here we get datas of both Google and Yahoo in a single api call.
I want to get image urls of both Google and Yahoo in a single wiki api call.
Is there any method to get all image urls of different pages in a single wiki api call?
Yes, just switch prop=extracts to prop=images. Works exactly the same way:
http://en.wikipedia.org/w/api.php?action=query&titles=Yahoo|Google&prop=images
The full documentation is here: http://www.mediawiki.org/wiki/API:Properties
To get the url of an image, use prop=imageinfo&iiprop=url for the corresponding file page.
Finally, you can combine it all in one single request, by using the prop=images result as a generator for the prop=imageinfo call:
http://en.wikipedia.org/w/api.php?action=query&prop=imageinfo&iiprop=url&generator=images&titles=Google|Yahoo!
There are some warning because you are missing the format=json. So the correct is
https://en.wikipedia.org/w/api.php?action=query&prop=imageinfo&format=json&iiprop=url&generator=images&titles=Google|Yahoo

Accessing full url of all page images Wikipedia API

I'm experimenting with the Wikipedia API and was trying to get the full urls for all images on a particular page, in this example Google's main page (http://en.wikipedia.org/wiki/Google).
I found the page id through the use of another API and then attempted to use this information in the following API to get the full urls of all images on that page:
http://en.wikipedia.org/w/api.php?action=query&pageids=1092923&generator=images&prop=imageinfo&iiprop=url|dimensions|mime&format=xml
I get some of the page images from this but cannot understand why I am not getting all - specifically the logo which is what I was most interested in. Apologies I am aware that there are similar questions which have been asked but I was not able to find one which would assist me here.
The API does not give you all results at once, it defaults to 10 results. You see in the beginning answer that you have a value for the parameter gimcontinue. If you use it like this you get more images: http://en.wikipedia.org/w/api.php?action=query&pageids=1092923&generator=images&prop=imageinfo&iiprop=url|dimensions|mime&format=xml&gimcontinue=1092923|Google_bike.jpg
Alternatively, you can ask for more images at once using gimlimit like this: http://en.wikipedia.org/w/api.php?action=query&pageids=1092923&generator=images&prop=imageinfo&iiprop=url|dimensions|mime&format=xml&gimlimit=500

Get all page ids linked to a given wikipedia page

I am trying to use the wikimedia public apis for accessing the english wikipedia database.
I would like to have a way to obtain all the page ids linked to a given page.
If I do like this:
http://en.wikipedia.org/w/api.php?action=query&titles=computer&format=xml
I am only able to obtain the page id of the 'computer' page.
I know I could parse for the 'href' tags inside that page and make n queries, but it is not very efficient.
Can I achieve this through apis alone?
It looks like you're looking for the backlinks module.
With that, you can do something like:
http://en.wikipedia.org/w/api.php?action=query&bltitle=computer&list=backlinks&format=xml
Also, the API uses paging, so you'll most likely need to add &bllimit=max to the query and then make follow-up requests to get the remaining pages.

How to use regular urls without the hash symbol in spine.js?

I'm trying to achieve urls in the form of http://localhost:9294/users instead of http://localhost:9294/#/users
This seems possible according to the documentation but I haven't been able to get this working for "bookmarkable" urls.
To clarify, browsing directly to http://localhost:9294/users gives a 404 "Not found: /users"
You can turn on HTML5 History support in Spine like this:
Spine.Route.setup(history: true)
By passing the history: true argument to Spine.Route.setup() that will enable the fancy URLs without hash.
The documentation for this is actually buried a bit, but it's here (second to last section): http://spinejs.com/docs/routing
EDIT:
In order to have urls that can be navigated to directly, you will have to do this "server" side. For example, with Rails, you would have to build a way to take the parameter of the url (in this case "/users"), and pass it to Spine accordingly. Here is an excerpt from the Spine docs:
However, there are some things you need to be aware of when using the
History API. Firstly, every URL you send to navigate() needs to have a
real HTML representation. Although the browser won't request the new
URL at that point, it will be requested if the page is subsequently
reloaded. In other words you can't make up arbitrary URLs, like you
can with hash fragments; every URL passed to the API needs to exist.
One way of implementing this is with server side support.
When browsers request a URL (expecting a HTML response) you first make
sure on server-side that the endpoint exists and is valid. Then you
can just serve up the main application, which will read the URL,
invoking the appropriate routes. For example, let's say your user
navigates to http://example.com/users/1. On the server-side, you check
that the URL /users/1 is valid, and that the User record with an ID of
1 exists. Then you can go ahead and just serve up the JavaScript
application.
The caveat to this approach is that it doesn't give search engine
crawlers any real content. If you want your application to be
crawl-able, you'll have to detect crawler bot requests, and serve them
a 'parallel universe of content'. That is beyond the scope of this
documentation though.
It's definitely a good bit of effort to get this working properly, but it CAN be done. It's not possible to give you a specific answer without knowing the stack you're working with.
I used the following rewrites as explained in this article.
http://www.josscrowcroft.com/2012/code/htaccess-for-html5-history-pushstate-url-routing/

Canonical links and paging

Google has been pushing its new canonical link feature, I agree it is really useful. Now instead of having a ton of entry points in to an area you can have one entry.
I was wondering, does this feature play nice with paging?
For example: I have this page which has 8 pages of content, if I specify the canonical of http://community.mediabrowser.tv/permalinks/154/iso-always-detected-as-a-movie-when-checking-metadata for the page, will there be any undesired side effects? Will this be better overall? Will this mean that a hit on page 5 will take users to page 1?
When specifying a canonical URL, it should have substantially the same content. Pages 2-8 have different content. Yes, if Google were to honor your canonical link on page 5, it would send users to page 1.
You should use the canonical link on page 1 so that Google knows that http://community.mediabrowser.tv/topics/154 and http://community.mediabrowser.tv/topics/154?page=1&response_type=3 are the same as http://community.mediabrowser.tv/permalinks/154/iso-always-detected-as-a-movie-when-checking-metadata
You may also want to put canonical links on the other pages so Google knows that http://community.mediabrowser.tv/topics/154?page=5 is the same as http://community.mediabrowser.tv/topics/154?page=5&response_type=3
You should only add canonical links on pages with identical content. For example, a set of links presented in a different order: sorted by date or alphabetically.
In your case all pages have different content (albeit representing several pages of the same article or conversation thread). Which means you don't need to canonicalize them.
Still if you do, all that happens is that Google gives more priority to the first page, rather than the other pages when displaying them in search results.
Canonical links do not affect your visitors. They only suggest priority and possible duplicate content to bots.
More info from Google here