Get all page ids linked to a given wikipedia page - api

I am trying to use the wikimedia public apis for accessing the english wikipedia database.
I would like to have a way to obtain all the page ids linked to a given page.
If I do like this:
http://en.wikipedia.org/w/api.php?action=query&titles=computer&format=xml
I am only able to obtain the page id of the 'computer' page.
I know I could parse for the 'href' tags inside that page and make n queries, but it is not very efficient.
Can I achieve this through apis alone?

It looks like you're looking for the backlinks module.
With that, you can do something like:
http://en.wikipedia.org/w/api.php?action=query&bltitle=computer&list=backlinks&format=xml
Also, the API uses paging, so you'll most likely need to add &bllimit=max to the query and then make follow-up requests to get the remaining pages.

Related

Search in mutliple sites using Google Custom Search JSON

Trying to figure out how can i search in mutliple sites using Google Custom Search JSON API.
Meaning that search will be only from a specific sites list.
i was playing with the api explorer - https://developers.google.com/custom-search/v1/reference/rest/v1/cse/list?apix_params=%7B%22cx%22%3A%22011602274690322925368%3Atkz2zvvpmk0%22%2C%22siteSearch%22%3A%22www.walla.co.il%22%7D
and noticed the site search query key, but it can only accept a single string not a list of sites:
enter image description here
What is the way to search in only in specific sites?
Thanks
There's a couple things you can do.
If you know the specific sites you want to search, you can add them as refinements to your engine. Then query for that refinement by adding 'more:<REFINEMENT_LABEL>' to the query.
Or, add 'site:' operators to the query itself. For example cats site:cnn.com OR site:bbc.com

Get wikitext from wikipedia API?

I'm looking at the API documentation here,
https://www.mediawiki.org/wiki/API:Query
Getting the wikitext for a page is mentioned in the beginning of the documentation,
The action=query module allows you to get information about a wiki and the data stored in it, such as the wikitext of a particular page, the links and categories of a set of pages, or the token you need to change wiki content.
but I cant seem to figure out what parameters to pass in the API request to return the wikitext for a given page. Anyone know how to do this?
I've tried parameters like,
{'action':'query', 'titles':'Anarchism', 'prop':'wikitext', 'format':'json'}
You must use this query .
https://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&format=json&titles=Anarchism&rvslots=main

Accessing full url of all page images Wikipedia API

I'm experimenting with the Wikipedia API and was trying to get the full urls for all images on a particular page, in this example Google's main page (http://en.wikipedia.org/wiki/Google).
I found the page id through the use of another API and then attempted to use this information in the following API to get the full urls of all images on that page:
http://en.wikipedia.org/w/api.php?action=query&pageids=1092923&generator=images&prop=imageinfo&iiprop=url|dimensions|mime&format=xml
I get some of the page images from this but cannot understand why I am not getting all - specifically the logo which is what I was most interested in. Apologies I am aware that there are similar questions which have been asked but I was not able to find one which would assist me here.
The API does not give you all results at once, it defaults to 10 results. You see in the beginning answer that you have a value for the parameter gimcontinue. If you use it like this you get more images: http://en.wikipedia.org/w/api.php?action=query&pageids=1092923&generator=images&prop=imageinfo&iiprop=url|dimensions|mime&format=xml&gimcontinue=1092923|Google_bike.jpg
Alternatively, you can ask for more images at once using gimlimit like this: http://en.wikipedia.org/w/api.php?action=query&pageids=1092923&generator=images&prop=imageinfo&iiprop=url|dimensions|mime&format=xml&gimlimit=500

Infinite amt of Google custom search boxes per website?

I've got a site where users can create groups (we call them games)
www.ongoingworlds.com/games/270/
www.ongoingworlds.com/games/287/ etc
Each of these games has it's own user-generated content. I want to use a Google custom search for each game. But I can't see an easy way to amend the embed code to add a dynamic path, and I don't want to have to register multiple (hundreds) of GCSEs separately to get an embed code for each.
What would be the best way of allowing each of these URLs (above) to have their own GCSE?
You can search subparts of your site by using a combination of site: operator and webSearchQueryAddition parameter on gcse element.
webSearchQueryAddition appends additional search term to your user's query. If for each of the "games" you change the webSearchQueryAddition to point to the "game" base url, the search results will be matching that url. You can inject that parameter programmatically with e.g. javascript, for each of the "games".
Documentation is here: https://developers.google.com/custom-search/docs/element#supported_attributes
And here is working example:
http://jsfiddle.net/t2s5M/

What should i add to my site to make google index the subpages as well

I am a beginner web developer and i have a site JammuLinks.com, it is built on php. It is a city local listing search engine. Basically i've written search pages which take in a parameter, fetch the records from the database and display it. So it is dynamically generating the content. However if you look at the bottom of the site, i have added many static links where i have hard coded the parameters in the link like searchresult.php?tablename='schools'. So my question is
Since google crawls the page and also the links listed in the page, will it be crawling the results page data as well? How can i identify if it has. So far i tried site:www.jammulinks.com but it results the homepage and the blog alone.
What more can i add to make the static links be indexed by it as well.
The best way to do this is to create a sitemap document (you can even get the template from Google's webmaster portion of their sites, www.google.com/webmasters/ I believe).