How to retrieve all properties of Wikidata ordered by their usage using SPARQL - properties

I found a query retrieving all properties of Wikidata together with property id, label, description and aliases
PREFIX bd: <http://www.bigdata.com/rdf#>
PREFIX schema: <http://schema.org/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX wikibase: <http://wikiba.se/ontology#>
SELECT ?p ?pt ?pLabel ?d ?aliases WHERE {
{
SELECT ?p ?pt ?d
(GROUP_CONCAT(DISTINCT ?alias; separator="|") as ?aliases)
WHERE {
?p wikibase:propertyType ?pt .
OPTIONAL {?p skos:altLabel ?alias FILTER (LANG (?alias) = "en")}
OPTIONAL {?p schema:description ?d FILTER (LANG (?d) = "en") .}
} GROUP BY ?p ?pt ?d
}
SERVICE wikibase:label {
bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en".
}
}
and a query counting properties used by items pointing to Q46 through a statement
SELECT ?property ?count
WHERE {
SELECT ?property (COUNT(?item) AS ?count)
WHERE {
?item ?statement wd:Q46 . # items pointing to Q46 through a statement
?property wikibase:statementProperty ?statement . # property used for that statement
} GROUP BY ?property # count usage for each property pointing to that entity
} ORDER BY DESC(?count) # show in descending order of uses
I would combine them without depending on Q46 but I don't know exactly how.

Such SPARQL query will take too much time leading to execution time out. The alternatives are:
Develop & use an application that
unzip blocks of ~900 KiB from bzip2 archive of Wikidata JSON dump
(https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.json.bz2)
pass that unzipped data from block to a JSON parser (it could be an event-driven JSON parser)
parse that JSON extracting valuable data
Develop & use an application that
reads bzip2 dump archive as described at point 1
import parsed JSON data into an SQL database
perform SQL queries on your own database extracting valuable data
Another way involving less development effort is:
extract Wikidata JSON dump archive (~65 GiB) resulting an ~1.4 TB json file
develop a small aplication that parse that type of json file using an event-driven parser
parse that JSON extracting valuable data

Related

How to get property labels from Wikidata using SPARQL

I am using SPARQLWrapper to send SPARQL queries to Wikidata.
At the moment I am trying to find all properties for an entity. Eg. with a simple tuple such as: wd:Q11663 ?a ?b. This in itself works, but I am trying to find human readable labels for the returned properties and entities.
Although SERVICE wikibase:label works using Wikidata's GUI interface, this does not work with SPARQLWrapper - which insists on returning identical values for a variable and its 'label'.
Querying on the property rdfs:label works for the entity (?b), but this approach does not work with the property (?a).
it would appear the property is being returned as a full URI such as http://www.wikidata.org/prop/direct/P1536 . Using the GUI I can successfully query wd:P1536 ?a ?b.. This works with SPARQLWrapper if I send it as a second query - but not in the first query.
Here is my code:
from SPARQLWrapper import SPARQLWrapper, JSON
sparql = SPARQLWrapper("http://query.wikidata.org/sparql")
sparql.setQuery("""
SELECT ?a ?aLabel ?propLabel ?b ?bLabel
WHERE
{
wd:Q11663 ?a ?b.
# Doesn't work with SPARQLWrapper
#SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
#?prop wikibase:directClaim ?p
# but this does (and is more portable)
?b rdfs:label ?bLabel. filter(lang(?bLabel) = "en").
# doesn't work
#?a rdfs:label ?aLabel.
# property code can be extracted successfully
BIND( strafter(str(?a), "prop/direct/") AS ?propLabel).
#BIND( CONCAT("wd:", strafter(str(?a), "prop/direct/") ) AS ?propLabel).
# No matches, even if I concat 'wd:' to ?propLabel
?propLabel rdfs:label ?aLabel
# generic search for any properties also fails
#?propLabel ?zz ?aLabel.
}
""")
# However, this returns a label for P1536 - which is one of wd:Q11663's properties
sparql.setQuery("""SELECT ?b WHERE
{
wd:P1536 rdfs:label ?b.
}
""")
So how can I get the labels for the properties in one query (which should be more efficient)?
[aside: yes I'm a bit rough & ready with the EN filter - often dropping it if I'm not getting anything back]
I was having problems with two approaches - and the code above contains a mixture of both. Also, SPARQLWrapper isn't a problem here.
The first approach using the wikibase Label service should be like this:
SELECT ?a ?aLabel ?propLabel ?b ?bLabel
WHERE
{
?item rdfs:label "weather"#en.
?item ?a ?b.
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
?prop wikibase:directClaim ?a .
}
This code also includes a lookup from the label ('weather') to the query entity (?item).
The SERVICE was working, but if there isn't an rdfs:label definition then it just returns the entity. The GUI and SPARQLWrapper (to the SPARQL endpoint) were simply returning the results in a different order - so it looked like I was seeing lots of 'failed' output (ie. entities and failed labels both being reported as the same).
This became clear when I started adding an OPTIONAL clause to the approach below.
The ?prop wikibase:directClaim ?a . line turns out to be pretty simple. Wikibase defines directClaim to map properties to entities. This then allows it to define tuples about properties (ie. a label). Many other ontologies just use the same identifiers.
My second (more generic approach) is the approach you find in many of the books and online tutorials. The problem here is that wikibase's properties have the full URL in them, and I needed to convert them into an entity. I tried string manipulation but this produces a string literal - not an entity. The solution is to use directClaim again:
?prop wikibase:directClaim ?a .
?prop rdfs:label ?propLabel. filter(lang(?propLabel) = "en").
Note that this only returns a result if rdfs:label is defined. Adding an OPTIONAL will return results even if there is no label defined.

Listing Properties and Values of an Individual on Dbpedia

How can I list properties with their values for any given DBpedia class? I'm new to this and have looked at several other questions on this but I haven't found exactly what I'm looking for.
What I'm trying to do is providing some relevant additional information to topics of conversation I have got from text mining.
Say for example the topic of conversation in a certain community is iPhones. I would like to use this word to query the DBpedia page for this word, IPhone, to get an output such as:
Type: Smartphone
Operating System: IOS
Manufacturer: Foxconn
EDIT:
Using the query from AKSW I can print the p (property?) and o (object?), although I'm still not getting the output I want. Instead of getting something like:
weight: 133.0
I get
http://dbpedia.org/property/weight:133.0
Is there a way to just get the name of the property instead of the DBpedia link?
My Code
Classes do not "have" properties with values. Instances (resp. resources or individuals) do have a relationship via a property to some value which can be an individual itself or a literal (or some anonymous instance aka blank node). And instances belong to a class. e.g. Berlin belongs to the class City
What you want is to get all outgoing values of a given resource in DBpedia:
SELECT * WHERE { <http://dbpedia.org/resource/IPhone> ?p ?o }
Alternatively, you can use SPARQL DESCRIBE, which return the data in forms of an RDF graph resp. a set of RDF triples:
DESCRIBE <http://dbpedia.org/resource/IPhone>
This might also return incoming information because it's not really specified in the W3C recommendation what has to be returned.
As stated by AKSW properties often link to other classes rather than values. If you want all properties and their values, including other classes the the below gives you the label and filters by language (put the language code you need where have put "en").
SELECT DISTINCT ?label ?o
WHERE {
<http://dbpedia.org/resource/IPhone> ?p ?o.
?p <http://www.w3.org/2000/01/rdf-schema#label> ?label .
FILTER(LANG(?label) = "" || LANGMATCHES(LANG(?label), "en"))
}
If you don't want any properties that link to other classes, then you only want datatype properties so this code could help:
SELECT DISTINCT ?label ?o
WHERE {
<http://dbpedia.org/resource/IPhone> ?p ?o.
?p <http://www.w3.org/2000/01/rdf-schema#label> ?label .
?p a owl:DatatypeProperty .
FILTER(LANG(?label) = "" || LANGMATCHES(LANG(?label), "en"))
}
Obviously this gives you far less information and functionality, but it might just be what you're after?
Edit: In reply to your comment, it is also possible to get the labels for the values, using the same technique:
SELECT DISTINCT ?label ?oLabel
WHERE {
<http://dbpedia.org/resource/IPhone> ?p ?o.
?p <http://www.w3.org/2000/01/rdf-schema#label> ?label .
?o <http://www.w3.org/2000/01/rdf-schema#label> ?oLabel
FILTER(LANG(?label) = "" || LANGMATCHES(LANG(?label), "en"))
}
Note that http://www.w3.org/2000/01/rdf-schema#label is often shortened to rdfs:label by defining prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
So you could also do:
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT DISTINCT ?label ?oLabel
WHERE {
<http://dbpedia.org/resource/IPhone> ?p ?o.
?p rdfs:label ?label .
?o rdfs:label ?oLabel
FILTER(LANG(?label) = "" || LANGMATCHES(LANG(?label), "en"))
}
and get exactly the same result but possibly easier to read.

Retrieving the wider dbpedia vocabulary for tagging pictures

I'm trying to develop a tool in JS for tagging pictures, so I need a set of possible "things" from dbpedia. I already tryed to retrieve this way:
select ?s ?l {
?s a owl:Class .
?s rdf:type ?l
FILTER regex(str(?s), "House", "i").
}
http://dbpedia.org/snorql/?query=select+%3Fs+%3Fl+%7B%0D%0A+++%3Fs+a+owl%3AClass+.%0D%0A+++%3Fs+rdf%3Atype+%3Fl%0D%0A+++FILTER+regex%28str%28%3Fs%29%2C+%22House%22%2C+%22i%22%29.%0D%0A%7D
And also this way:
select ?label
WHERE {
?concept a skos:Concept.
?concept skos:prefLabel ?label.
FILTER regex(str(?label), "^House", "i").
}
http://dbpedia.org/snorql/?query=select+%3Flabel+%0D%0AWHERE+%7B%0D%0A++%3Fconcept+a+skos%3AConcept.%0D%0A++%3Fconcept+skos%3AprefLabel+%3Flabel.%0D%0A++FILTER+regex%28str%28%3Flabel%29%2C+%22%5EHouse%22%2C+%22i%22%29.%0D%0A%7D
In the first case, I just have "instances" of the house "thing", but not the "House" class itself. In the second one, I never retrieve the "house" and the similar thing is "houses". Any alternative for retrieving a better vocabulary based in dbpedia dataset?
If you don't bother to restrict yourself to owl:Thing or to skos:Concept, you can just get things that have a label that contains "house". Rather than using regex, I chose to use contains and lcase, since a string containment could be less expensive than invoking a full regular expression processor.
select ?thing ?label where {
?thing rdfs:label ?label .
filter contains(lcase(?label), "house")
}
SPARQL results (limited to 200)

Sparql to recover the Type of a DBpedia resource

I need a Sparql query to recover the Type of a specific DBpedia resource. Eg.:
pt.DBpedia resource: http://pt.dbpedia.org/resource/Argentina
Expected type: Country (as can be seen at http://pt.dbpedia.org/page/Argentina)
Using pt.DBpedia Sparql Virtuoso Interface (http://pt.dbpedia.org/sparql) I have the query below:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select ?l ?t where {
?l rdfs:label "Argentina"#pt .
?l rdf:type ?t .
}
But it is not recovering anything, just print the variable names. The virtuoso answer.
Actually I do not need to recover the label (?l) too.
Anyone can fix it, or help me to define the correct query?
http in graph name
I'm not sure how you generated your query string, but when I copy and paste your query into the endpoint and run it, I get results, and the resulting URL looks like:
http://pt.dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fpt.dbpedia.org&sho...
However, the link in your question is:
http://pt.dbpedia.org/sparql?default-graph-uri=pt.dbpedia.org%2F&should-sponge...
If you look carefully, you'll see that the default-graph-uri parameters are different:
yours: pt.dbpedia.org%2F
mine: http%3A%2F%2Fpt.dbpedia.org
I'm not sure how you got a URL like the one you did, but it's not right; the default-graph-uri needs to be http://pt.dbpedia.org, not pt.dbpedia.org/.
The query is fine
When I run the query you've provided at the endpoint you've linked to, I get the results that I'd expect. It's worth noting that the label here is the literal "Argentina"#pt, and that what you've called ?l is the individual, not the label. The individual ?l has the label "Argentina"#pt.
We can simplify your query a bit, using ?i instead of ?l (to suggest individual):
select ?i ?type where {
?i rdfs:label "Argentina"#pt ;
a ?type .
}
When I run this at the Portuguese endpoint, I get these results:
If you don't want the individual in the results, you don't have to select it:
select ?type where {
?i rdfs:label "Argentina"#pt ;
a ?type .
}
or even:
select ?type where {
[ rdfs:label "Argentina"#pt ; a ?type ]
}
If you know the identifier of the resource, and don't need to retrieve it by using its label, you can even just do:
select ?type where {
dbpedia-pt:Argentina a ?type
}
type
==========================================
http://www.w3.org/2002/07/owl#Thing
http://www.opengis.net/gml/_Feature
http://dbpedia.org/ontology/Place
http://dbpedia.org/ontology/PopulatedPlace
http://dbpedia.org/ontology/Country
http://schema.org/Place
http://schema.org/Country

Returning properties from dbpedia Virtuoso

When I look at the HTML page: http://dbpedia.org/page/Bill_Nye I can see a lot of properties that are not returned in the following simple query from the Virtuoso page (http://pt.dbpedia.org/sparql):
prefix foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?s ?p WHERE {
?e foaf:name "Bill Nye"#en .
?e ?s ?p.
}
No results return when I try to access one of the properties I can see on this page- say foaf:depiction:
prefix foaf: <http://xmlns.com/foaf/0.1/>
SELECT $depiction WHERE {
?s foaf:name "Bill Nye"#en.
?e foaf:depiction ?depiction
}
When I run them via the sparql endpoint at http://dbpedia.org/sparql, after encoding
SELECT ?s ?p WHERE { ?e foaf:name "Bill Nye"#en.?e ?s ?p. }
I get
http://dbpedia.org/sparql?query=SELECT%20%3Fs%20%3Fp%20WHERE%20%7B%20%3Fe%20foaf%3Aname%20%22Bill%20Nye%22#en.%3Fe%20%3Fs%20%3Fp.%20%7D&format=json
And a result of what looks like all the properties shown at http://dbpedia.org/page/Bill_Nye. I would love an explaniation of the difference, is it simply the Virtuoso interface or something more? I'm pretty fresh at this semantic web, so please be gentle.
Please note that you are sending the queries to two different Virtuoso installations:
pt.dbpedia.org/sparql : is the international chapter for Portuguese language (PT stands for DBpedia Portuguese)
dbpedia.org/sparql : is the main SPARQL endpoint for DBpedia, containing data from multiple languages, but in an English-centric way.
You will also have different experiences with es.dbpedia.org, it.dbpedia.org, el.dbpedia.org, etc.
I18n chapters do not load exactly the same data sets as the main DBpedia. Please see:
http://mappings.dbpedia.org/index.php/DBpedia_datasets
I believe the following query (ordered) will enable you to more easily identify the missing results --
prefix foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?s ?p ?o WHERE
{
?s foaf:name "Bill Nye"#en .
?s ?p ?o.
}
order by 2
here you can easily identify that all the missing results have predicates of the form -
is ********** of
Basically, these are just eye candy in the human viewable Html view - and represent additional relationships wherever the object of some other subject is http://dbpedia.org/resource/Bill_Nye
That is -
<something else> ?p <http://dbpedia.org/resource/Bill_Nye>