How to get with Mediawiki API all images in a category which are not in another one? - api

I am entirely new to API, so sorry if the question is silly.
I would like to get all images in a category in Commons let's say X, but exclude those which are also in another one (Y). I do not understand if I can actually do this.
https://commons.wikimedia.org/w/api.php?action=query&list=categorymembers&cmtype=file&cmtitle=Category:X
will get all of them, how to exclude some?
moreover I would like in the result to have the description of the images, not just the name of the file, is that possible?

MediaWiki has - by default - no built-in support for category building and querying intersections. To accomplish this task, extensions or external tools or multiple API queries and result processing is required.
CirrusSearch API
On Wikimedia Commons, like on the whole Wikimedia Wiki farm, CirrusSearch powers filtered search, including search for category intersections and is also available through API (action=query&list=search&srsearch=incategory:A+-incategory:B, this is Category:A minus Category:B).
FastCCI
One of the tools I can recommend (because it's a dedicated high-performance solution and actually running) is fastcci, developed by Daniel Schwen; specifically for Wikimedia Commons, there is already a database maintained and a webservice running but it's possible to set it up for any wiki, provided the tool set has a host to run on and has database access.
Query
Consider the following query URL:
https://fastcci.wmflabs.org/?c1=3302993&c2=15516712&d1=0&d2=0&s=200&a=not&t=js
https://fastcci.wmflabs.org/ - Host Wikimedia Commons fastcci runs on
c1 - ID of category 1
c2 - ID of category 2
d1 - depth of category 1 to search in (fastcci by default considers sub-categories)
d2 - depth of category 2 to search in (fastcci by default considers sub-categories)
s - Number or results to return
o - Offset
a - conjunction
t - connection type (t=js for a JSONP response; otherwise assumes being used as websocket)
Response
fastcciCallback( [ 'RESULT 27572680,0,0|1675043,0,0|27577015,0,0|27577043,0,0|27577106,0,0|27576896,0,0|27576790,0,0|23481936,0,0|17560964,0,0|11009066,0,0', 'OUTOF 10', 'DBAGE 378310', 'DONE'] );
RESULT followed by a | separated list of up to 50 integer triplets of the form pageId,depth,tag. Each triplet stands for one image or category
Resources
Sample client side implementation - to see it in action, just visit any category and next to the Good pictures button in any category page.
Example is FilesOf('Category:Saaleck') - FilesOf('Category:Rapeseed fields in Saxony-Anhalt')
Server application
Presentation on YouTube
Slides
A note on pageIDs
page IDs → page titles: GET /w/api.php?action=query&pageids=page_IDs_separated_by_pipe
page titles → page IDs: GET /w/api.php?action=query&titles=Titles_separated_by_pipe

AFAIK, there is no way to get that directly using the API. But, assuming both categories are reasonably small, you could get all images from both of them and then compute the complement in your code.
To retrieve the description, you can use prop=imageinfo&iiprop=extmetadata&iiextmetadatafilter=ImageDescription.
In the context of your example query, it would look like this:
https://commons.wikimedia.org/w/api.php?action=query&generator=categorymembers&gcmtype=file&gcmtitle=Category:X&prop=imageinfo&iiprop=extmetadata&iiextmetadatafilter=ImageDescription

Related

how to get table info and summary of page using Wikipedia api?

I want to get minimal information of a Wikipedia page using MediaWiki API like DuckDuckGo. For example for Steve Carell: https://duckduckgo.com/?q=steve+carell&t=hp&ia=news&iax=about
How can I get this information with a Wikipedia url (eg https://en.wikipedia.org/wiki/Steve_Carell) in HTML format?
You can use the MediaWiki API for that. There's an extension, TextExtracts, which is exactly for that (and it is installed on Wikipedia).
In your case, e.g.:
https://en.wikipedia.org/w/api.php?action=query&prop=extracts&exsentences=1&titles=Steve%20Carell
will return something like:
<p class=\"mw-empty-elt\">\n</p>\n\n<p class=\"mw-empty-elt\">\n \n</p>\n<p><b>Steven John Carell</b> (<span></span>; born August 16, 1962) is an American actor, comedian, producer, writer and director.</p>
You can customize how many sentences (or characters) the API returns, as well, please consult the API documentation for that.
There's also the way to retrieve the short description, which is saved at Wikidata (and visible in the mobile view of Wikipedia). This call would be:
https://en.wikipedia.org/w/api.php?action=query&prop=pageprops&titles=Steve_Carell
This returns the following property in the pageprops of the page:
"wikibase-shortdesc": "American actor"
This may fit better depending on your use case.
You can even get both of the results with a single, combined, request:
https://en.wikipedia.org/w/api.php?action=query&prop=extracts|pageprops&exsentences=1&titles=Steve_Carell

How to get all Wikipedia page links with their pageIDs?

Starting a request like that:
https://en.wikipedia.org/w/api.php?action=query&format=json&titles=Title&prop=links&pllimit=500
provides me a list of links (that the page contains) where every link consists of the title and the ns (namespace)
Is there a way to also get the PageID together with title & ns? (the less work it is for the sever the better of course)
You need to use generator parameter. Here is an example for Cobra Wikipedia page.
https://en.wikipedia.org/w/api.php?action=query&generator=links&titles=Cobra&prop=info&gpllimit=500

Obtain Deviantart Deviation ID / UUID from page URL

I was looking at the Deviantart API to see what you can do with it .
A lot of requests require you to provide a deviation id to work with.
Take for instance adding a deviation to favorites ( in Collections -> Add deviation to favorites above, I cannot post more than 2 links... )
Now I looked through the API to figure out how to obtain that id, but I did not find out how to do so.
If I only have the deviation URL, for instance http://kennyklent.deviantart.com/art/Pinkie-Pie-Dancing-296143815 , how can I tell its deviation-id?
It is not the number at the end 296143815, I would've thought so, but it's not.
If it helps, here's one example from the api's /browse/dailydeviations endpoint
"deviationid": "27FD366A-30CB-FC3E-DE54-9621E90FCE60",
"printid": "E984FC87-8B57-239C-FE7C-E2674A0DDFC4",
"url": "http://mudimba.deviantart.com/art/SF-Botanical-Gardens-57879397",
So this deviation SF-Botanical-Gardens-57879397 has the id 27FD366A-30CB-FC3E-DE54-9621E90FCE60 - but how would I find out if it wasn't listed in the examples?
Update 06/2017:
For anyone stumbling across this 2 years later, the answer below still works but there is now another way to get the UUID. Every Deviation now has a meta property da:appurl showing the UUID value on the deviation page itself.
To stay with the SF-Botanical-Gardens-57879397 example from above, looking at the page source at http://mudimba.deviantart.com/art/SF-Botanical-Gardens-57879397 reveals:
<meta property="da:appurl" content="DeviantArt://deviation/27FD366A-30CB-FC3E-DE54-9621E90FCE60">
Which contains exactly the UUID value 27FD366A-30CB-FC3E-DE54-9621E90FCE60
Original answer
I got an answer from a Deviantart dev directly, http://comments.deviantart.com/1/492518964/3755610860
You cannot convert integer IDs into UUID format, you have to query the api to find the correct uuid. So for your example, you would query the /gallery/folders endpoint and then the gallery/{folderid} endpoint to get the list of deviations in that folder.
There's no easier way to obtain the UUID for a given URL for now.

Understanding RESTful. URIs for complex actions

I'm trying to build a RESTful service, and I've faced with some problems. I'll describe these problems (questions) with an example of an imaginary RESTful service.
For example, I need a "News" service on my site. News can be of different types: local news and global news. News are added by administrator. User can view both local and global news (separately or all-together). News are shown by pages. User can view the exact news.
So, I've built such a verb-noun table for this task:
GET /news - Get all news
POST /news - Create news
GET /news/{id} - Show the news with id={id}
PUT /news/{id} - Edit the news with id={id}
GET /news/{type}/{page}/{per_page} - Get news page #{page} of type {type}
GET /news/{page} - Get news page #{page} of both types
So, there are problems:
1) how to distinguish {page} and {id}? maybe {id} can be only number, but {page} - a string, started with 'p' (for example 'p1'}?
2) User can change the value "per_page" - how many news are shown on a page. Isn't it too complicated - /news/{type}/{page}/{per_page}? How it can be simplified?
3) How should be URLs in browser look like on this services? URLs won't be exact as URIs from table above?
For example:
/news - Viewing news (1st page with default 'per_page' and default 'type')
/news/{type} - Viewing news (1st page with default 'per_page' and type={type})
/news/{id} - Viewing exact news with id={id}
/news/{type}/{page}/{per_page} - Viewing exact page of news of exact type.
4) Additional functional. For example filter search ( getting news by date, author or title).
How to realize this with REST? How filter object (xml or json) should be transmitted? How to make URL of page with results of the filter? /news/{date:12.12.2012,author:'admin'} or something better?
Sorry for my rough English, If you see some grammar and etc mistakes - feel free to correct them.
Thanks in advance.
I'd say you should use regular params for the type, page and per_page. Type, Page and Per_Page do not represent unique Resources, but are rather filters to the collection of News Resources. So I'd do
/news
/news/{id}
/news?type={type}&page={page}&per_page={per_page}
Same for additional filtering.
Make sure to check out http://www.ics.uci.edu/~fielding/pubs/dissertation/evaluation.htm#sec_6_2
As Gordon wrote, you can use request params as normal. Remember that REST doesn't means only clean and nice urls.
So, leave ids and type parameters in uri, but pagination params add with query string.
Also, to distinguish different uri parts, you could use pattern used in Google's gdata i.e. params are preceded with name
/news
/news/id/{id}
/news/type/{type}
with some parsing on server side, you could add many parameters, optional parameters and not enforce exact ordering.

Use custom function to populate gSpreadsheet cell based on a XML/JSON response

Ok, this one has become a little tricky for me and I really need some assistance to work through it.
Problem
I have a GSpreadsheet which has a list of data, in this case Twitter usernames. Using the API of a service provider (in this case the Klout API), I would like to retrieve information about that user to populate a cell within a spreadsheet.
Based on what I can work out so far, I would need to write a custom function to do this but I have no idea where to start, how I might construct it, or if there are any examples of doing this.
Scenario
The Klout API can return either an XML or JSON response (see http://developer.klout.com/docs/read/api/API), based on the string passed. For example, the URL:
http://api.klout.com/1/users/show.xml?key=SECRET&users=thewinchesterau
would return the following XML response:
<users>
<user>
<twitter_id>17439480</twitter_id>
<twitter_screen_name>thewinchesterau</twitter_screen_name>
<score>
<kscore>56.63</kscore>
<slope>0</slope>
<description>creates content that is spread throughout their network and drives discussions.</description>
<kclass_id>10</kclass_id>
<kclass>Socializer</kclass>
<kclass_description>You are the hub of social scenes and people count on you to find out what's happening. You are quick to connect people and readily share your social savvy. Your followers appreciate your network and generosity.</kclass_description>
<kscore_description>thewinchesterau has a low level ofinfluence.</kscore_description>
<network_score>58.06</network_score>
<amplification_score>29.16</amplification_score>
<true_reach>90</true_reach>
<delta_1day>0.3</delta_1day>
<delta_5day>0.5</delta_5day>
</score>
</user>
</users>
Based on this response, I would like to be able to populate different cells with the values returned within the XML (or JSON if easier) packet.
So, for example, I would have a spreadsheet like the following which would have custom functions to go out and retrieve the value of the relevant XML element response to populate the cell:
Cell A B C D E
1 Username kscore Network score Amplification score True reach
2 thewinchester =kscore(A2) =nscore(A2) =ascore(A2) =tscore(A2)
Questions
Are there any gSpreadsheet examples you know of that use an API to pull data in from an external source?
How would one write a custom function to fetch the result from the API and populate a cell with a result of a specific element?
Any information, examples or helpers you have are greatly appreciated.
You want the importXML function, documented here. The formula you want will look something like this:
=importXML("http://api.klout.com/1/users/show.xml?key=SECRET&users=" + A1, "//users/user/score/kscore")
You could write a custom script with Google AppScript, but there's a simple solution to this similar to what Nick Johnson posted. I've tested this against the score function, but it could be easily adapted to the show endpoint with different XPath.
=importXML("http://api.klout.com/1/klout.xml?users="&A1&"&key=YOUR_API_KEY", "//users/user/kscore")
This presumes your Twitter IDs are in the A column.
Note, Google Docs limits the number of such importXML functions to 50 per spreadsheet. You could concatenate groups of 5 userids for each importXML call, effectively putting your limit to 250 a sheet.
This could also be adapted to a similar call in Excel that doesn't have that limit. Keep in mind the Klout ToS, though, using proper attribution and rate limits.