Why is some information from the Wikipedia infobox missing on DBpedia? - sparql

Why is some information from the Wikipedia infobox missing on DBpedia?
For example, star Alpha Librae has property distance-from-earth in the infobox, but it isn't a property of the Alpha Librae dbpedia resource. On the other hand, star Betelgeuse has this piece of information on DBpedia). And many other stars have this distance information in the infobox, but there isn't any matching property in the DBpedia resource.
Is there a way to extract thise missing information from DBpedia using SPARQL or is the only way web scraping of the wiki page?

The DBpedia pages present all the data DBpedia has -- no SPARQL nor other query can get data that isn't there.
DBpedia is updated periodically. It may not reflect the latest changes on Wikipedia.
Also, extractors are a living project, and may not grab every property in which you're interested.
Looking at Betelgeuse on Wikipedia, I see one distance in the infobox. Looking at Alpha_Librae, I see two distances. Which should DBpedia have? Perhaps you have the niche knowledge which can ensure that the extractors do the right thing...
As #JoshuaTaylor suggests, you will probably get more satisfactory answers from the DBpedia discussion list and/or the DBpedia development list.

Look at en.wikipedia.org/wiki/Volkswagen_Golf_Mk3:
In the infobox you have:
height = 1991-95 & Cabrio: {{convert|1422|mm|in|1|abbr=on}}1996-99: {{convert|1428|mm|in|1|abbr=on}}
In dbpedia you get height=1991-95
instead of
height=1442
height=1428
This happens because there is no standard how to define properties in a conditional way. For this reason, dbpedia properties are sometimes wrong/missing

Related

How to reconcile in OpenRefine by Wikipedia article title?

I want to reconcile a large number of records, of which I have the exact Wikipedia article titles (including parenthetical disambiguation). What is the best/fastest way to match this large number of records based on their exact Wikipedia title in OpenRefine? If I simply reconcile by text, the confidence is low and Wikidata entries with the same title get mixed up.
Transform your values into Wikipedia URLs, for instance with the following GREL formula (assuming all articles are on the English Wikipedia):
'https://en.wikipedia.org/wiki/'+value
You can then reconcile this column with the Wikidata reconciliation service, which will recognize these URLs and resolve the Wikidata items via site links.
If your article titles contain disambiguation pages, reconciliation will give you disambiguation items, so it is a good practice to double-check their type (P31) by fetching it after reconciliation.
I think you are approaching from the opposite direction. Use #Wikidata numbers, which are also available for the disambiguation pages! The Wikidata item is on the left side pane. It provides disambiguation and is language neutral and queryable. Every Wikipedia entry has a Wikidata entry.
There might also be a SPARQL query that would do this work for you. If you ask some of the Wikidatans they can help. Try #wikidatafacts on Twitter.
If you need non-linked text included, which might be in some of the disamb page list, the manual nature of Wikipedia won’t help you. But you could spot check for those outliers.

Web scraping wikipedia data table, but from dbpedia, and examples/very basic, elementary tutorial resources to build queries

I wanted to ask about the Semantic Web part, in particular using DBpedia. In general, what DBpedia can and can’t do? I roughly understand the subject-verb-object model for something like DBpedia. Practically and concretely speaking, I want to web scrape the technical data (mass, thrust, etc.) found in the Wikipedia page of the Long March rocket family
Now, as of right now (i.e., as far as I know), to find what DBpedia has (i.e., how I’m using DBpedia to find data) is that I find what I’m interested in Wikipedia, copying the last part of the URL, and copy that into DBpedia (is there any method more sophisticated than that?), resulting in this page.
Looking at that page, I only see links to related articles, links, and the abstract.
Other than my smaller questions above, my main question is this: so does DBpedia not have the data table that I want?
Next, could someone help me give me some tips or pointers for building a SPARQL or query string for DBpedia? It seems to me that one wouldn't know how to build one as there's no "directory" for what could or couldn't be asked. Thanks.
DBpedia is an active project, and DBpedia extractors are continuing to evolve. Contributions that might help you would include adding infoboxes to Wikipedia pages, and data extractors to DBpedia. Check the DBpedia website for info, or write to dbpedia-discussion to get started.
As for finding DBpedia content, there are several interfaces you can work with --
Faceted Browse and Search
direct SPARQL query interface
iSPARQL, a drag-and-drop SPARQL query builder
SNORQL, another SPARQL query interface
so does dbpedia not have the data table that I want?
No, it doesn't. Usually, DBpedia gets its data from infoboxes. Your article doesn't have one, so DBpedia can't get much information out of it.

Filter Wikipedia geosearch per region

I saw Wikipedia API (called MediaWiki GeoData) to search wiki pages around fixed coordinates. An example call is
https://it.wikipedia.org/w/api.php?action=query&list=geosearch&gsradius=10000&gscoord=37.786971|-122.399677
I saw also that GeoData, in its Extra parameters, has also the concept of region, accepting a ISO 3166-2 region code.
How can I search elements, filtering per this region code? For example, if I am searching around some coordinates near the border between two regions, am I able to filter only the elements of one region?
Short answer: you can't.
Longer answer: we currently lack two features which I just filed in the issue tracker for you, i.e.
populating the region parameter in GeoData: most pages do not specify the region for GeoData but only in their free text (which is useless), the only structured data we have is in Wikidata;
adding an option to filter the results by region.
For now, you'll have to do everything yourself client-side: figure out the coordinates of each region and filter by those; or search the region in Wikidata statements and then fetch corresponding articles in the language desired. As you are a developer you could also help import country data in Wikidata ;-).
(Expanded from MaxSem's answer, hence wiki.)

What SPARQL term can I use to reference a Wikipedia category

I'm trying to extract a list of articles from a specific category in Wikipedia. I'm trying to accomplish this via DBpedia. However, I can't actually find the specific 'term' or 'field' to reference a 'category'. Where is this documented?
The section 4.2. Classifications following section from the DBpedia documentation has a link to the DCMI vocabulary, but I do not see any relevant 'field' or 'term' for category.
Use dcterms:subject
When working with DBpedia, it's almost always easier to browse some data first and then figure out how to the write the query. For instance, in this case, you can take a look at dbpedia:Mount_Monadnock. You'll see that it's related by the dcterms:subject property to four categories:
category:Landforms_of_Cheshire_County,_New_Hampshire
category:Monadnocks
category:Mountains_of_New_Hampshire
category:National_Natural_Landmarks_in_New_Hampshire
If you look at the links that those point to, you'll see the right URIs, e.g.,
http://dbpedia.org/resource/Category:Mountains_of_New_Hampshire
Now, if you're looking at the DBpedia SPARQL Endpoint web interface, you'll see a Namespace Prefixes link in the top right, which will show you the predefined (when you're working with the web interface) prefix:
category: http://dbpedia.org/resource/Category:
Thus, you could use a query like this to get a list of articles in the Mountains of New Hampshire category:
select ?article {
?article dcterms:subject category:Mountains_of_New_Hampshire
}
SPARQL results
What the DBpedia documentation says
The link to the documentation in the question points to
4.2. Classifications
… Wikipedia Categories are represented using the SKOS vocabulary and DCMI terms.
As we've seen, the dcterms:subject property is used to link articles to their categories. The SKOS vocabulary is used to connect categories. E.g., skos:broader is used to connect subcategories and supercategories. I agree that the documentation does not give quite as much information as one might expect, but this might be forgivable since the data is easy enough to browse (and the properties are described in the documentation on those vocabularies).
Related
It's a bit more specific, and would have been hard to search for if I didn't already know what to look for, but this question and answer helps:
Getting a list of American physicists from DBpedia using SPARQL
This is also helpful:
SPARQL- Retrieving a DBPedia resource by category string

How to get information in info box of Wikipedia articles using Wikipedia api?

I'm trying to get lead actor's name from movie's Wikipedia article.
I tried different values for prop, prop=info seems most relevant. But this doesn't contain the information in info box of Wikipedia article.
See:
http://en.wikipedia.org/w/api.php?action=query&prop=info&titles=Casino_Royale_(2006_film)&format=jsonfm
Is it possible to extract information in infobox using Wikipedia API?
The MediaWiki API doesn't understand infoboxes. So, you have basically two options:
Parse the infobox yourself. You can either parse the wikitext directly or the generated HTML table (both are available from the API).
Let somebody else do the parsing. This is exactly what DBPedia does. Wikidata tries to do something similar, but it probably won't contain enough data to be usable for a long time; see growth statistics.