Extract related articles in different languages using Wikidata Toolkit - wikipedia-api

I'm trying to extract interlanguage related articles in Wikidata dump. After searching on the internet, I found out there is a tool named Wikidata Toolkit that helps to work with these type of data. But there is no information about how to find related articles in different languages. For example, the article: "Dresden" in the English language is related to the article: "Dresda" in the Italiano one. I mean the second one is the translated version of the first one.
I tried to use the toolkit, but I couldn't find any solution.
Please write some example about how to find this related article.

you can use Wikidata dump [1] to get a mapping of articles among wikipedias in multiple language.
for example if you see the wikidata entry for Respiratory System[2] at the bottom you see all the articles referring to the same topic in other languages.
That mapping is available in the wikidata dump. Just download wikidata dump and get the mapping and then get the corresponding text from the wikipedia dump.
You might encounter some other issues, like resolving wikipedia redirects.
[1] https://dumps.wikimedia.org/wikidatawiki/entities/
[2] https://www.wikidata.org/wiki/Q7891

Related

Get Knowledge Article Record Type using REST API

Is it possible to get the record type of articles using REST API? I can get a list of articles using this
/services/data/v38.0/support/knowledgeArticles?sort=ViewScore&channel=App&pageSize=3
but there's no record type available. Please help me. Thank You
Knowledge articles are spread across multiple objects. Try querying (or probably better to search with SOSL) the KnowledgeArticleVersion (Id, Title, Summary, ArticleType, KnowledgeArticleId) object and then depending on the article type you can search that specific knowledge article version object (__kav) for example Test__kav to get the specific article.
In summary try:
/services/data/v38.0/sobjects/KnowledgeArticleVersion
and then the article types specific to your org
Edit
This might be easier:
/services/data/v37.0/query/?q=SELECT+Id,+ArticleType,+KnowledgeArticleId+FROM+KnowledgeArticleVersion+WHERE+PublishStatus='online'
You might need to add language code if u have multiple languages enabled, but this would tell you the Article type for the articles in question

Web scraping wikipedia data table, but from dbpedia, and examples/very basic, elementary tutorial resources to build queries

I wanted to ask about the Semantic Web part, in particular using DBpedia. In general, what DBpedia can and can’t do? I roughly understand the subject-verb-object model for something like DBpedia. Practically and concretely speaking, I want to web scrape the technical data (mass, thrust, etc.) found in the Wikipedia page of the Long March rocket family
Now, as of right now (i.e., as far as I know), to find what DBpedia has (i.e., how I’m using DBpedia to find data) is that I find what I’m interested in Wikipedia, copying the last part of the URL, and copy that into DBpedia (is there any method more sophisticated than that?), resulting in this page.
Looking at that page, I only see links to related articles, links, and the abstract.
Other than my smaller questions above, my main question is this: so does DBpedia not have the data table that I want?
Next, could someone help me give me some tips or pointers for building a SPARQL or query string for DBpedia? It seems to me that one wouldn't know how to build one as there's no "directory" for what could or couldn't be asked. Thanks.
DBpedia is an active project, and DBpedia extractors are continuing to evolve. Contributions that might help you would include adding infoboxes to Wikipedia pages, and data extractors to DBpedia. Check the DBpedia website for info, or write to dbpedia-discussion to get started.
As for finding DBpedia content, there are several interfaces you can work with --
Faceted Browse and Search
direct SPARQL query interface
iSPARQL, a drag-and-drop SPARQL query builder
SNORQL, another SPARQL query interface
so does dbpedia not have the data table that I want?
No, it doesn't. Usually, DBpedia gets its data from infoboxes. Your article doesn't have one, so DBpedia can't get much information out of it.

Why is some information from the Wikipedia infobox missing on DBpedia?

Why is some information from the Wikipedia infobox missing on DBpedia?
For example, star Alpha Librae has property distance-from-earth in the infobox, but it isn't a property of the Alpha Librae dbpedia resource. On the other hand, star Betelgeuse has this piece of information on DBpedia). And many other stars have this distance information in the infobox, but there isn't any matching property in the DBpedia resource.
Is there a way to extract thise missing information from DBpedia using SPARQL or is the only way web scraping of the wiki page?
The DBpedia pages present all the data DBpedia has -- no SPARQL nor other query can get data that isn't there.
DBpedia is updated periodically. It may not reflect the latest changes on Wikipedia.
Also, extractors are a living project, and may not grab every property in which you're interested.
Looking at Betelgeuse on Wikipedia, I see one distance in the infobox. Looking at Alpha_Librae, I see two distances. Which should DBpedia have? Perhaps you have the niche knowledge which can ensure that the extractors do the right thing...
As #JoshuaTaylor suggests, you will probably get more satisfactory answers from the DBpedia discussion list and/or the DBpedia development list.
Look at en.wikipedia.org/wiki/Volkswagen_Golf_Mk3:
In the infobox you have:
height = 1991-95 & Cabrio: {{convert|1422|mm|in|1|abbr=on}}1996-99: {{convert|1428|mm|in|1|abbr=on}}
In dbpedia you get height=1991-95
instead of
height=1442
height=1428
This happens because there is no standard how to define properties in a conditional way. For this reason, dbpedia properties are sometimes wrong/missing

Querying DBpedia for partial URI external link and official website matches

I’m trying to retrieve Wikipedia pages based on the "official website" specified on them, but preferably without going and building a complete index of Wikipedia. If I query DBpedia using:
SELECT ?s WHERE {
?s foaf:homepage <http://www.nytimes.com>
}
I get the desired result, but there are several issues when trying to make this work in general:
foaf:homepage is mostly not set.
I couldn’t find a query-able propery that maps to "official website". In some cases, a query based on dbpedia-owl:wikiPageExternalLink works, but of course in others you get a list of pages that happen to have this page as a link.
URLs take various forms - www.example.com, www.example.com/, www.example.com/index.html, etc. and I couldn't figure out an efficient way to query based on a regular expression or even on STRSTARTS - seems like it always involves producing a huge query result and then filtering.
You are hitting on the fact that a lot of data in DBPedia is somewhat incomplete or poorly formatted. This is more or less unavoidable since its source material is the same way. For example, foaf:homepage is sometimes missing, but that is likely because in the source Wikipedia page that same info is missing. That being said, sometimes the crawling tools the DBPedia folks use misses a trick - if you think it's doing something wrong in converting Wikipedia data to RDF let them know directly and they can adjust their crawler.
Other than that, your question is a bit too broad to answer, really. foaf:homepage is the property used for the official website for a given topic. Where it's not set you simply don't know what the official site is. dbpedia-owl:wikiPageExternalLink is a general link for any external resource that is referenced by the wiki article - so it's not just the official website.
As for the formatting - I have yet to see this, most links I encountered while browsing are fully formed URLs. If you want us to answer that you'll have to edit your question to include some concrete examples.

How to get information in info box of Wikipedia articles using Wikipedia api?

I'm trying to get lead actor's name from movie's Wikipedia article.
I tried different values for prop, prop=info seems most relevant. But this doesn't contain the information in info box of Wikipedia article.
See:
http://en.wikipedia.org/w/api.php?action=query&prop=info&titles=Casino_Royale_(2006_film)&format=jsonfm
Is it possible to extract information in infobox using Wikipedia API?
The MediaWiki API doesn't understand infoboxes. So, you have basically two options:
Parse the infobox yourself. You can either parse the wikitext directly or the generated HTML table (both are available from the API).
Let somebody else do the parsing. This is exactly what DBPedia does. Wikidata tries to do something similar, but it probably won't contain enough data to be usable for a long time; see growth statistics.