Filter Wikipedia geosearch per region - wikipedia-api

I saw Wikipedia API (called MediaWiki GeoData) to search wiki pages around fixed coordinates. An example call is
https://it.wikipedia.org/w/api.php?action=query&list=geosearch&gsradius=10000&gscoord=37.786971|-122.399677
I saw also that GeoData, in its Extra parameters, has also the concept of region, accepting a ISO 3166-2 region code.
How can I search elements, filtering per this region code? For example, if I am searching around some coordinates near the border between two regions, am I able to filter only the elements of one region?

Short answer: you can't.
Longer answer: we currently lack two features which I just filed in the issue tracker for you, i.e.
populating the region parameter in GeoData: most pages do not specify the region for GeoData but only in their free text (which is useless), the only structured data we have is in Wikidata;
adding an option to filter the results by region.
For now, you'll have to do everything yourself client-side: figure out the coordinates of each region and filter by those; or search the region in Wikidata statements and then fetch corresponding articles in the language desired. As you are a developer you could also help import country data in Wikidata ;-).
(Expanded from MaxSem's answer, hence wiki.)

Related

How to reconcile in OpenRefine by Wikipedia article title?

I want to reconcile a large number of records, of which I have the exact Wikipedia article titles (including parenthetical disambiguation). What is the best/fastest way to match this large number of records based on their exact Wikipedia title in OpenRefine? If I simply reconcile by text, the confidence is low and Wikidata entries with the same title get mixed up.
Transform your values into Wikipedia URLs, for instance with the following GREL formula (assuming all articles are on the English Wikipedia):
'https://en.wikipedia.org/wiki/'+value
You can then reconcile this column with the Wikidata reconciliation service, which will recognize these URLs and resolve the Wikidata items via site links.
If your article titles contain disambiguation pages, reconciliation will give you disambiguation items, so it is a good practice to double-check their type (P31) by fetching it after reconciliation.
I think you are approaching from the opposite direction. Use #Wikidata numbers, which are also available for the disambiguation pages! The Wikidata item is on the left side pane. It provides disambiguation and is language neutral and queryable. Every Wikipedia entry has a Wikidata entry.
There might also be a SPARQL query that would do this work for you. If you ask some of the Wikidatans they can help. Try #wikidatafacts on Twitter.
If you need non-linked text included, which might be in some of the disamb page list, the manual nature of Wikipedia won’t help you. But you could spot check for those outliers.

Limit Graph DB responses per category

I'm sure there's already an SO question asking the same thing, but I have been unable to find it. Perhaps I just don't know the right words to express this question properly. Some context: I'm deciding if AWS Neptune is a good fit for an upcoming project. So, I apparently have access to a SPARQL engine and a Tinkerpop/Gremlin engine if that helps answer the question.
Let's say I have items in a graph database. Those items have a "property" called category. Is there away to get a max of 20 items from each distinct category?
I've scoured various SPARQL resources (e.g. the docs) and haven't found anything describing this sort of functionality, and I've never had to do this in my previous sparql encounters. Not to familiar TinkerPop and Gremlin, but my initial readings haven't touched on this either.
It's fairly straightforward with Gremlin. Using the air-routes graph which has a region property for each airport. The following query will return five airports or less for California and Texas (there are more than 5 in the graph for each state).
gremlin> g.V().has('airport','region',within('US-CA','US-TX')).
group().
by('region').
by(identity().limit(5).fold())
==>[US-TX:[v[3],v[8],v[11],v[33],v[38]],US-CA:[v[13],v[23],v[24],v[26],v[27]]]
EDITED: Added additional example where specific regions are not looked for.
gremlin> g.V().hasLabel('airport').
limit(50).
group().
by('region').
by(identity().limit(5).fold())
==>[US-FL:[v[9],v[15],v[16],v[19],v[25]],US-NV:[v[30]],US-HI:[v[37]],US-TX:[v[3],v[8],v[11],v[33],v[38]],US-WA:[v[22]],US-NY:[v[12],v[14],v[32],v[35]],US-NC:[v[21]],US-LA:[v[34]],GB-ENG:[v[49],v[50]],US-PA:[v[45]],US-DC:[v[7]],US-NM:[v[44]],US-AZ:[v[20],v[43]],US-TN:[v[4]],CA-BC:[v[48]],CA-ON:[v[47]],PR-U-A:[v[40]],US-MN:[v[17]],US-IL:[v[18]],US-AK:[v[2]],US-VA:[v[10]],US-CO:[v[31]],US-MD:[v[6]],US-IA:[v[36]],US-MA:[v[5]],US-CA:[v[13],v[23],v[24],v[26],v[27]],US-UT:[v[29]],US-OH:[v[41]],US-GA:[v[1]],US-MI:[v[46]]]

fetching wikidata labels in other languages from reconciled column

I want to use wikidata reconciliation to translate a column of terms into various languages by fetching the labels in those languages. Using SPARQL, I'd filter a query for label by language (this is the approach suggested in various similar cases). I don't see how to do the same using OpenRefine reconciliation, however.
Maybe the problem is that the wikidata API is language-specific?
Say that you want to fetch labels in Italian, which has language code it. You can do that by entering Lit in the property input. You can also fetch descriptions with Dit or aliases with Ait. To fetch these terms in other languages, replace it by other language codes.
This is only documented at https://github.com/OpenRefine/OpenRefine/wiki/Reconciliation so far - I acknowledge that we need a more visible documentation for this (ideally it should be easily accessible from OpenRefine's user interface, given that the reconciliation service comes preconfigured in OpenRefine).

Why is some information from the Wikipedia infobox missing on DBpedia?

Why is some information from the Wikipedia infobox missing on DBpedia?
For example, star Alpha Librae has property distance-from-earth in the infobox, but it isn't a property of the Alpha Librae dbpedia resource. On the other hand, star Betelgeuse has this piece of information on DBpedia). And many other stars have this distance information in the infobox, but there isn't any matching property in the DBpedia resource.
Is there a way to extract thise missing information from DBpedia using SPARQL or is the only way web scraping of the wiki page?
The DBpedia pages present all the data DBpedia has -- no SPARQL nor other query can get data that isn't there.
DBpedia is updated periodically. It may not reflect the latest changes on Wikipedia.
Also, extractors are a living project, and may not grab every property in which you're interested.
Looking at Betelgeuse on Wikipedia, I see one distance in the infobox. Looking at Alpha_Librae, I see two distances. Which should DBpedia have? Perhaps you have the niche knowledge which can ensure that the extractors do the right thing...
As #JoshuaTaylor suggests, you will probably get more satisfactory answers from the DBpedia discussion list and/or the DBpedia development list.
Look at en.wikipedia.org/wiki/Volkswagen_Golf_Mk3:
In the infobox you have:
height = 1991-95 & Cabrio: {{convert|1422|mm|in|1|abbr=on}}1996-99: {{convert|1428|mm|in|1|abbr=on}}
In dbpedia you get height=1991-95
instead of
height=1442
height=1428
This happens because there is no standard how to define properties in a conditional way. For this reason, dbpedia properties are sometimes wrong/missing

Returning all images on a Wikipedia page

I am trying to write a SPARSQL query that will return all possible image URLs associated with a resource.
I can return the foaf:depiction, if there is one, but often when I visit said page on Wikipedia I see there are other pictures that I cannot 'get at'. For example - for video games - there is a Notion The Game cover and box art (for some games, not all), but I don't know how to get their URLs returned with queries.
An example showing exactly how to return, say a box cover and cartridge picture for a game like Super Mario Bros., would answer this question perfectly.
As far as I know, DBpedia only extracts the first image from each Wikipedia article, and it's not possible to get at the other images through DBpedia.
I agree with Cygri, however there are some extra notes
1) DBpedia extracts the first image from each Wikipedia article unless it has a "non-free" licence template
2) you can also use flickr images (non-wikipedia) from the flickr dataset [1] using the http://dbpedia.org/property/hasPhotoCollection property
[1] http://wiki.dbpedia.org/Downloads37#linkstoflickrwrappr