How to get more info within only one geosearch call via Wikipedia API? - api

I am using an API call similar to http://en.wikipedia.org/w/api.php?action=query&list=geosearch&gsradius=10000&gscoord=41.426140|26.099319.
I returns something like this
<?xml version="1.0"?>
<api>
<query>
<geosearch>
<gs pageid="27460829" ns="0" title="Kostilkovo" lat="41.416666666667" lon="26.05" dist="4245.1" primary="" />
<gs pageid="27460781" ns="0" title="Belopolyane" lat="41.45" lon="26.15" dist="4988.7" primary="" />
<gs pageid="27460862" ns="0" title="Siv Kladenets" lat="41.416666666667" lon="26.166666666667" dist="5713.5" primary="" />
<gs pageid="13811116" ns="0" title="Svirachi" lat="41.483333333333" lon="26.116666666667" dist="6521.9" primary="" />
<gs pageid="27460810" ns="0" title="Gorno Lukovo" lat="41.366666666667" lon="26.1" dist="6613.4" primary="" />
<gs pageid="27460799" ns="0" title="Dolno Lukovo" lat="41.366666666667" lon="26.083333333333" dist="6746.2" primary="" />
<gs pageid="27460827" ns="0" title="Kondovo" lat="41.433333333333" lon="26.016666666667" dist="6937" primary="" />
<gs pageid="27460848" ns="0" title="Plevun" lat="41.45" lon="26.016666666667" dist="7383.1" primary="" />
<gs pageid="24179704" ns="0" title="Villa Armira" lat="41.499069444444" lon="26.106263888889" dist="8130" primary="" />
<gs pageid="27460871" ns="0" title="Zhelezari" lat="41.413333333333" lon="25.998333333333" dist="8540.1" primary="" />
</geosearch>
</query>
</api>
But while I am actually trying to get some pictures of those pages, subsequent calls are needed, like
to get some page images
http://en.wikipedia.org/w/api.php?action=query&prop=images&pageids=13843906
then, to get image info
http://en.wikipedia.org/w/api.php?action=query&titles=File:Alexandru_Ioan_Cuza_Dealul_Patriarhiei.jpg&prop=imageinfo&iiprop=url
Well, even if this gets me what I ultimately need, it is not efficient at all.
I would like to know if there are some parameters for this calls, or maybe completely other call(s) that would bring all this info in maximum 2 steps/calls. It would be great, though, if it would be only one.

Wow, I had no idea that such a feature exists nowadays! But to answer your question, since it's a list query, you can probably use it as a generator.
Let's try it:
Original geosearch query: http://en.wikipedia.org/w/api.php?action=query&list=geosearch&gsradius=10000&gscoord=41.426140|26.099319
Generator query to get images on matching pages: http://en.wikipedia.org/w/api.php?action=query&prop=images&imlimit=max&generator=geosearch&ggsradius=10000&ggscoord=41.426140|26.099319
The prop=images query can also be used as a generator, so you can also do this:
Get URLs for all images on a list of pages: http://en.wikipedia.org/w/api.php?action=query&prop=imageinfo&iiprop=url&generator=images&gimlimit=max&pageids=13811116|24179704|27460781|27460799|27460810|27460827|27460829|27460848|27460862|27460871
Alas, AFAIK you can't nest generators, so you can't do both steps in one query. You can either:
get the list of images in one query, and then use another query to get the URLs, or
start with the basic geosearch query to get the page IDs, and then get the images and their URLs in another query.
Alas, it turns out that both of these options fail to give you some information that you may want. If you use list=geosearch as a generator, you don't get the coordinate information that you may need if you e.g. wish to display the results on a map. On the other hand, using prop=images as a generator makes you miss out on something even more important: the knowledge of which images are used on which pages!
Thus, unfortunately, it seems that, if your goal is to place images on a map, you'll probably have to do it with three separate queries. At least you can still query multiple pages / images in one request, so you shouldn't need more than three (until you hit the query limits and need to use continuations, that is).
(Also, doing it in three steps lets you apply some filtering to the images before the third step. For example, most of the pages returned by your example query only have the same three images — Flag of Bulgaria.svg, Ivaylovgrad Reservoir.jpg and Oblast Khaskovo.png — all of which are used via templates, and none of which really look like good choices to represent the specific location.)
Ps. If you're just interested in finding images near a particular location, even if they're not used on any specific Wikipedia article, you might want to try using geosearch directly on Wikimedia Commons. It doesn't seem to return any results for your Bulgarian example coordinates, but it works just fine in a more crowded location.

Here is an alternative to build on the previous answer. If you start with this query as a partial answer:
https://en.wikipedia.org/w/api.php?action=query&prop=images&imlimit=max&generator=geosearch&ggsradius=10000&ggscoord=41.426140|26.099319
Then you can build on this to get the information in a single query. The pageimages property can work with the generator. You cannot nest generators but you can chain properties. A query can use pageimages to get the page's main image url for each of the geosearch results. It looks like this:
https://en.wikipedia.org/w/api.php?action=query&prop=images|pageimages&pilimit=max&piprop=thumbnail&iwurl=&imlimit=max&generator=geosearch&ggsradius=10000&ggscoord=41.426140|26.099319
This query returns the image "File" names (images property) and a single URL for the main image (pageimages property). The main image of the page is all I need. You might be able to extrapolate the "file" urls by matching the changes from the file to the url that is output with the query but I cannot recommend such a hack.
The images property has a setting that is supposed to return urls for interwiki links, iwurl. I see the "file" as an interwiki link. This parameter is not working and images does not return a url. Playing on the sandbox might lead you to a better answer.
Intuitively it seems like you should be able to chain the images and imageinfo properties together. Doing so does not give the expected results.
If a single url for the main image of the page is not enough I can encourage you to play in the API sandbox to try and get what you need with some combination of properties. I am using the geosearch generator and get the page image, text description, and lat/long coordinates so that I can get the address. Good luck!

Related

Informix XML CLOB extract returning NULL when specifying any #attribute

Informix IDS 12.25 is returning NULL whenever an #attribute is specified. In the image below we have the same document being queried by two statements. The difference between the statements is that one of them specifies an #attribute. While the other doesn't. And, as is possible to see in the image, the attribute indeed exists, because it's returned by one of the columns.
I've been searching a lot, seeing documentations and documentations, all places are saying that the syntax is correct. I don't know what to do anymore. Really thanks.
[Edit]
Here goes a sample of the xml File I'm working with:
<Frame>
<Shape sizeX="5400" sizeY="4400" distance="1800">
<ShapePoint>
<Point direction="0" radius="266" />
<Point direction="144" radius="280" />
<Point direction="243" radius="289" />
<Point direction="279" radius="291" />
</ShapePoint>
</Shape>
</Frame>
Alternative approaches for this problem, if mainly using the database engine, also would be extremely welcomed.
It's definitely a valid Xpath, except the first one selects a node, and the one that isn't working selects a string, which makes me think extractclob() is having a problem with this type of result.
Here's my test in Python to demonstrate this is the correct xpath for the given xml.
In [16]: tree.xpath('/Frame/Shape/ShapePoint/Point[1]')
Out[16]: [<Element Point at 0x102d68bc0>]
In [17]: tree.xpath('/Frame/Shape/ShapePoint/Point[1]/#radius')
Out[17]: ['266']
What happens if you use extractvalueclob() instead?
https://www.ibm.com/support/knowledgecenter/SSGU8G_12.1.0/com.ibm.xml.doc/ids_xpextractvalue.htm

Can I iterate form-lists in Moqui?

Is there a way to do the following in Moqui?
Say I have a list of parent categories (or classifications etc.)... Taking Request categories:
<entity-find entity-name="mantle.request.RequestCategory" list="parentCategoryList">
<econdition field-name="parentCategoryId" operator="is-null" />
</entity-find>
And I want to use 'parentCategoryList' to produce a sub-list for EACH parent category, to display separate form-lists on screen:
Something like:
<iterate list="parentCategoryList" entry="thisCategory" >
<entity-find entity-name="mantle.request.RequestCategory" list="categoryList">
<econdition field-name="parentCategoryId" from="thisCategory.requestCategoryId" />
</entity-find>
<!-- I include the following only to give an idea of what I am trying to do.
It is incorrect and incomplete -->
<script>listOfLists.add(categoryList)</script>
</iterate>
Then use that 'listOfLists' to iterate a form-list, supplying the form-list 'name' and 'list' sequentially for each list in the list. (I know you can't use iterate outside of actions, and you can't use forms inside of actions.)
I may well be thinking about this in the wrong way.
You can iterate within the screen.widgets element, just use section-iterate. There are limitations to how much you can nest these (the current template macros for XML Screens/Forms only support so much), but you can do quite a bit. There are example of this in SimpleScreens, like the OrderDetail.xml screen iterating over order parts.

FetchXML next page results

I want to populate a grid with data from Dynamics CRM. I use fetchXML, to get for each page 10 records. I want to get to the next page, to retrieve the next 10 records. But this isn't happening, I'm using XRMToolbox to simulate the fetch query but it returns me the same results, regardless of the page attribute value.
The fetchXML query is:
<fetch version="1.0" output-format="xml-platform" mapping="logical" count="10" page="1" aggregate="true" distinct="false" >
<entity name="webpage" >
<attribute name="url" groupby="true" alias="url" />
<attribute name="webpageid" aggregate="count" alias="top" />
<order descending="true" alias="top" />
</entity>
</fetch>
If I change the page attribute value, say to 10 the response won't be different.
Can anyone help me with this?
UPDATE
After many tests with XRMToolbox I've come to conclusion that this query won't listen, whatever page I provide to it. This is because of the aggregate attribute. If I remove it and of course remove the count aggregate, then changing the page attribute will actually fetch for me the next page results.
So in summary page attribute doesn't like the aggregate attribute. Maybe this can work with paging cookies, but I haven't tested it yet, I will test it and update this post.
To implement paging you need to use not only page number/records per page attributes but paging cookie as well. This msdn article provides all code you need to implement paging.

Does the Wikipedia API support searches for a specific template?

Is it possible to query the Wikipedia API for articles that contain a specific template? The documentation does not describe any action that would filter search results to pages that contain a template. Specifically, I am after pages that contain Template:Persondata. After that, I am hoping to be able to retrieve just that specific template in order to populate genealogy data for the openancestry.org project.
The query below shows that the Albert Einstein page contains the Persondata Template, but it doesn't return the contents of the template, and I don't know how to get a list of page titles that contain the template.
http://en.wikipedia.org/w/api.php?action=query&prop=templates&titles=Albert%20Einstein&tlcontinue=736|10|ParmPart
Returns:
<api>
<query>
<pages>
<page pageid="736" ns="0" title="Albert Einstein">
<templates>
...
<tl ns="10" title="Template:Persondata"/>
...
</templates>
</page>
</pages>
</query>
<query-continue>
<templates tlcontinue="736|10|Reflist"/>
</query-continue>
</api>
I suspect that I can't get what I need from the API, but I'm hoping I'm wrong and that someone has already blazed a trail down this path.
You can use the embeddedin query to find all pages that include the template:
curl 'http://en.wikipedia.org/w/api.php?action=query&list=embeddedin&eititle=Template:Persondata&eilimit=5&format=xml'
Which gets you:
<?xml version="1.0"?>
<api>
<query>
<embeddedin>
<ei pageid="307" ns="0" title="Abraham Lincoln" />
<ei pageid="308" ns="0" title="Aristotle" />
<ei pageid="339" ns="0" title="Ayn Rand" />
<ei pageid="340" ns="0" title="Alain Connes" />
<ei pageid="344" ns="0" title="Allan Dwan" />
</embeddedin>
</query>
<query-continue>
<embeddedin eicontinue="10|Persondata|595" />
</query-continue>
</api>
See full docs at mediawiki.org.
Edit Use embeddedin query instead of backlinks (which doesn't cover template inclusions)
Using embeddedin does not allow you to search for a specific person, the search string becomes the Template:Persondata.
The best way I've found to get only people from Wikipedia is to use list=search and filter the search using AND"Born"AND"Occupation":
http://en.wikipedia.org/w/api.php?action=query&list=search&srsearch="Tom Cruise"AND"Born"AND"Occupation"&format=jsonfm&srprop=snippet&srlimit=50`
Remember that Wikipedia is using a search engine that doesn't yet allow us to search only the title, it will search the full text. You can take advantage of that to get more precise results.
The accepted answer explains how to list pages using a certain template, but if you need to search for pages using the template, you can with the hastemplate: search keyword: https://en.wikipedia.org/w/api.php?action=query&list=search&srsearch=hastemplate:NPOV%20physics

list=alllinks confusion

I'm doing a research project for the summer and I've got to use get some data from Wikipedia, store it and then do some analysis on it. I'm using the Wikipedia API to gather the data and I've got that down pretty well.
What my questions is in regards to the links-alllinks option in the API doc here
After reading the description, both there and in the API itself (it's down and bit and I can't link directly to the section), I think I understand what it's supposed to return. However when I ran a query it gave me back something I didn't expect.
Here's the query I ran:
http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=google&rvprop=ids|timestamp|user|comment|content&rvlimit=1&list=alllinks&alunique&allimit=40&format=xml
Which in essence says: Get the last revision of the Google page, include the id, timestamp, user, comment and content of each revision, and return it in XML format.
The allinks (I thought) should give me back a list of wikipedia pages which point to the google page (In this case the first 40 unique ones).
I'm not sure what the policy is on swears, but this is the result I got back exactly:
<?xml version="1.0"?>
<api>
<query><normalized>
<n from="google" to="Google" />
</normalized>
<pages>
<page pageid="1092923" ns="0" title="Google">
<revisions>
<rev revid="366826294" parentid="366673948" user="Citation bot" timestamp="2010-06-08T17:18:31Z" comment="Citations: [161]Tweaked: url. [[User:Mono|Mono]]" xml:space="preserve">
<!-- The page content, I've replaced this cos its not of interest -->
</rev>
</revisions>
</page>
</pages>
<alllinks>
<!-- offensive content removed -->
</alllinks>
</query>
<query-continue>
<revisions rvstartid="366673948" />
<alllinks alfrom="!2009" />
</query-continue>
</api>
The <alllinks> part, its just a load of random gobbledy-gook and offensive comments. No nearly what I thought I'd get. I've done a fair bit of searching but I can't seem to find a direct answer to my question.
What should the list=alllinks option return?
Why am I getting this crap in there?
You don't want a list; a list is something that iterates over all pages. In your case you simply "enumerate all links that point to a given namespace".
You want a property associated with the Google page, so you need prop=links instead of the alllinks crap.
So your query becomes:
http://en.wikipedia.org/w/api.php?action=query&prop=revisions|links&titles=google&rvprop=ids|timestamp|user|comment|content&rvlimit=1&format=xml