How to get property labels from Wikidata using SPARQL - sparql

I am using SPARQLWrapper to send SPARQL queries to Wikidata.
At the moment I am trying to find all properties for an entity. Eg. with a simple tuple such as: wd:Q11663 ?a ?b. This in itself works, but I am trying to find human readable labels for the returned properties and entities.
Although SERVICE wikibase:label works using Wikidata's GUI interface, this does not work with SPARQLWrapper - which insists on returning identical values for a variable and its 'label'.
Querying on the property rdfs:label works for the entity (?b), but this approach does not work with the property (?a).
it would appear the property is being returned as a full URI such as http://www.wikidata.org/prop/direct/P1536 . Using the GUI I can successfully query wd:P1536 ?a ?b.. This works with SPARQLWrapper if I send it as a second query - but not in the first query.
Here is my code:
from SPARQLWrapper import SPARQLWrapper, JSON
sparql = SPARQLWrapper("http://query.wikidata.org/sparql")
sparql.setQuery("""
SELECT ?a ?aLabel ?propLabel ?b ?bLabel
WHERE
{
wd:Q11663 ?a ?b.
# Doesn't work with SPARQLWrapper
#SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
#?prop wikibase:directClaim ?p
# but this does (and is more portable)
?b rdfs:label ?bLabel. filter(lang(?bLabel) = "en").
# doesn't work
#?a rdfs:label ?aLabel.
# property code can be extracted successfully
BIND( strafter(str(?a), "prop/direct/") AS ?propLabel).
#BIND( CONCAT("wd:", strafter(str(?a), "prop/direct/") ) AS ?propLabel).
# No matches, even if I concat 'wd:' to ?propLabel
?propLabel rdfs:label ?aLabel
# generic search for any properties also fails
#?propLabel ?zz ?aLabel.
}
""")
# However, this returns a label for P1536 - which is one of wd:Q11663's properties
sparql.setQuery("""SELECT ?b WHERE
{
wd:P1536 rdfs:label ?b.
}
""")
So how can I get the labels for the properties in one query (which should be more efficient)?
[aside: yes I'm a bit rough & ready with the EN filter - often dropping it if I'm not getting anything back]

I was having problems with two approaches - and the code above contains a mixture of both. Also, SPARQLWrapper isn't a problem here.
The first approach using the wikibase Label service should be like this:
SELECT ?a ?aLabel ?propLabel ?b ?bLabel
WHERE
{
?item rdfs:label "weather"#en.
?item ?a ?b.
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
?prop wikibase:directClaim ?a .
}
This code also includes a lookup from the label ('weather') to the query entity (?item).
The SERVICE was working, but if there isn't an rdfs:label definition then it just returns the entity. The GUI and SPARQLWrapper (to the SPARQL endpoint) were simply returning the results in a different order - so it looked like I was seeing lots of 'failed' output (ie. entities and failed labels both being reported as the same).
This became clear when I started adding an OPTIONAL clause to the approach below.
The ?prop wikibase:directClaim ?a . line turns out to be pretty simple. Wikibase defines directClaim to map properties to entities. This then allows it to define tuples about properties (ie. a label). Many other ontologies just use the same identifiers.
My second (more generic approach) is the approach you find in many of the books and online tutorials. The problem here is that wikibase's properties have the full URL in them, and I needed to convert them into an entity. I tried string manipulation but this produces a string literal - not an entity. The solution is to use directClaim again:
?prop wikibase:directClaim ?a .
?prop rdfs:label ?propLabel. filter(lang(?propLabel) = "en").
Note that this only returns a result if rdfs:label is defined. Adding an OPTIONAL will return results even if there is no label defined.

Related

Wikidata SPARL query by generic property return object instead of statement

I want to disseminate the query by spreading the nodes like in the below query.
SELECT ?item ?itemLabel ?prop ?object ?objectLabel ?prop2 ?object2
WHERE
{
VALUES (?item) { (wd:Q12418)}
?item ?prop ?object.
?object ?prop2 ?object2
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". } # Helps get the label in your language, if not, then en language
}
Here I start with wd:Q12418(Mona Lisa) and I retrieve all properties of it - than for each object I want to retrieve all properties of it as well.
The problem is that ?item ?prop ?object line returns statement for object rather than the object itself. So it's pointing to Mona Lisa's statement like this: https://www.wikidata.org/wiki/Q12418#Q12418$9c08a93a-4591-561d-36a9-78083ae0a3fa
In this case second part (?object ?prop2 ?object2) works on this statement so it returns something if only that statement has subproperties. What I'd like to get is the object(e.g. painter Leonardo Da Vinci) rather than the statement so that I can iterate through object's properties.
How can I achieve that?
You want ?prop to start with the wdt: prefix, then you can add this filter function:
filter(strstarts(str(?prop), str(wdt:)))
and, if you need, an analogous one for ?prop2.

How to retrieve all properties of Wikidata ordered by their usage using SPARQL

I found a query retrieving all properties of Wikidata together with property id, label, description and aliases
PREFIX bd: <http://www.bigdata.com/rdf#>
PREFIX schema: <http://schema.org/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX wikibase: <http://wikiba.se/ontology#>
SELECT ?p ?pt ?pLabel ?d ?aliases WHERE {
{
SELECT ?p ?pt ?d
(GROUP_CONCAT(DISTINCT ?alias; separator="|") as ?aliases)
WHERE {
?p wikibase:propertyType ?pt .
OPTIONAL {?p skos:altLabel ?alias FILTER (LANG (?alias) = "en")}
OPTIONAL {?p schema:description ?d FILTER (LANG (?d) = "en") .}
} GROUP BY ?p ?pt ?d
}
SERVICE wikibase:label {
bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en".
}
}
and a query counting properties used by items pointing to Q46 through a statement
SELECT ?property ?count
WHERE {
SELECT ?property (COUNT(?item) AS ?count)
WHERE {
?item ?statement wd:Q46 . # items pointing to Q46 through a statement
?property wikibase:statementProperty ?statement . # property used for that statement
} GROUP BY ?property # count usage for each property pointing to that entity
} ORDER BY DESC(?count) # show in descending order of uses
I would combine them without depending on Q46 but I don't know exactly how.
Such SPARQL query will take too much time leading to execution time out. The alternatives are:
Develop & use an application that
unzip blocks of ~900 KiB from bzip2 archive of Wikidata JSON dump
(https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.json.bz2)
pass that unzipped data from block to a JSON parser (it could be an event-driven JSON parser)
parse that JSON extracting valuable data
Develop & use an application that
reads bzip2 dump archive as described at point 1
import parsed JSON data into an SQL database
perform SQL queries on your own database extracting valuable data
Another way involving less development effort is:
extract Wikidata JSON dump archive (~65 GiB) resulting an ~1.4 TB json file
develop a small aplication that parse that type of json file using an event-driven parser
parse that JSON extracting valuable data

What is the best way to discover metadata about an RDF schema from a SPARQL endpoint? [duplicate]

whenever I start using SQL I tend to throw a couple of exploratory statements at the database in order to understand what is available, and what form the data takes.
e.g.
show tables
describe table
select * from table
Could anyone help me understand the way to complete a similar exploration of an RDF datastore using a SPARQL endpoint?
Well, the obvious first start is to look at the classes and properties present in the data.
Here is how to see what classes are being used:
SELECT DISTINCT ?class
WHERE {
?s a ?class .
}
LIMIT 25
OFFSET 0
(LIMIT and OFFSET are there for paging. It is worth getting used to these especially if you are sending your query over the Internet. I'll omit them in the other examples.)
a is a special SPARQL (and Notation3/Turtle) syntax to represent the rdf:type predicate - this links individual instances to owl:Class/rdfs:Class types (roughly equivalent to tables in SQL RDBMSes).
Secondly, you want to look at the properties. You can do this either by using the classes you've searched for or just looking for properties. Let's just get all the properties out of the store:
SELECT DISTINCT ?property
WHERE {
?s ?property ?o .
}
This will get all the properties, which you probably aren't interested in. This is equivalent to a list of all the row columns in SQL, but without any grouping by the table.
More useful is to see what properties are being used by instances that declare a particular class:
SELECT DISTINCT ?property
WHERE {
?s a <http://xmlns.com/foaf/0.1/Person>;
?property ?o .
}
This will get you back the properties used on any instances that satisfy the first triple - namely, that have the rdf:type of http://xmlns.com/foaf/0.1/Person.
Remember, because a rdf:Resource can have multiple rdf:type properties - classes if you will - and because RDF's data model is additive, you don't have a diamond problem. The type is just another property - it's just a useful social agreement to say that some things are persons or dogs or genes or football teams. It doesn't mean that the data store is going to contain properties usually associated with that type. The type doesn't guarantee anything in terms of what properties a resource might have.
You need to familiarise yourself with the data model and the use of SPARQL's UNION and OPTIONAL syntax. The rough mapping of rdf:type to SQL tables is just that - rough.
You might want to know what kind of entity the property is pointing to. Firstly, you probably want to know about datatype properties - equivalent to literals or primitives. You know, strings, integers, etc. RDF defines these literals as all inheriting from string. We can filter out just those properties that are literals using the SPARQL filter method isLiteral:
SELECT DISTINCT ?property
WHERE {
?s a <http://xmlns.com/foaf/0.1/Person>;
?property ?o .
FILTER isLiteral(?o)
}
We are here only going to get properties that have as their object a literal - a string, date-time, boolean, or one of the other XSD datatypes.
But what about the non-literal objects? Consider this very simple pseudo-Java class definition as an analogy:
public class Person {
int age;
Person marriedTo;
}
Using the above query, we would get back the literal that would represent age if the age property is bound. But marriedTo isn't a primitive (i.e. a literal in RDF terms) - it's a reference to another object - in RDF/OWL terminology, that's an object property. But we don't know what sort of objects are being referred to by those properties (predicates). This query will get you back properties with the accompanying types (the classes of which ?o values are members of).
SELECT DISTINCT ?property, ?class
WHERE {
?s a <http://xmlns.com/foaf/0.1/Person>;
?property ?o .
?o a ?class .
FILTER(!isLiteral(?o))
}
That should be enough to orient yourself in a particular dataset. Of course, I'd also recommend that you just pull out some individual resources and inspect them. You can do that using the DESCRIBE query:
DESCRIBE <http://example.org/resource>
There are some SPARQL tools - SNORQL, for instance - that let you do this in a browser. The SNORQL instance I've linked to has a sample query for exploring the possible named graphs, which I haven't covered here.
If you are unfamiliar with SPARQL, honestly, the best resource if you get stuck is the specification. It's a W3C spec but a pretty good one (they built a decent test suite so you can actually see whether implementations have done it properly or not) and if you can get over the complicated language, it is pretty helpful.
I find the following set of exploratory queries useful:
Seeing the classes:
select distinct ?type ?label
where {
?s a ?type .
OPTIONAL { ?type rdfs:label ?label }
}
Seeing the properties:
select distinct ?objprop ?label
where {
?objprop a owl:ObjectProperty .
OPTIONAL { ?objprop rdfs:label ?label }
}
Seeing the data properties:
select distinct ?dataprop ?label
where {
?dataprop a owl:DatatypeProperty .
OPTIONAL { ?dataprop rdfs:label ?label }
}
Seeing which properties are actually used:
select distinct ?p ?label
where {
?s ?p ?o .
OPTIONAL { ?p rdfs:label ?label }
}
Seeing what entities are asserted:
select distinct ?entity ?elabel ?type ?tlabel
where {
?entity a ?type .
OPTIONAL { ?entity rdfs:label ?elabel } .
OPTIONAL { ?type rdfs:label ?tlabel }
}
Seeing the distinct graphs in use:
select distinct ?g where {
graph ?g {
?s ?p ?o
}
}
SELECT DISTINCT * WHERE {
?s ?p ?o
}
LIMIT 10
I often refer to this list of queries from the voiD project. They are mainly of a statistical nature, but not only. It shouldn't be hard to remove the COUNTs from some statements to get the actual values.
Especially with large datasets, it is important to distinguish the pattern from the noise and to understand which structures are used a lot and which are rare. Instead of SELECT DISTINCT, I use aggregation queries to count the major classes, predicates etc. For example, here's how to see the most important predicates in your dataset:
SELECT ?pred (COUNT(*) as ?triples)
WHERE {
?s ?pred ?o .
}
GROUP BY ?pred
ORDER BY DESC(?triples)
LIMIT 100
I usually start by listing the graphs in a repository and their sizes, then look at classes (again with counts) in the graph(s) of interest, then the predicates of the class(es) I am interested in, etc.
Of course these selectors can be combined and restricted if appropriate. To see what predicates are defined for instances of type foaf:Person, and break this down by graph, you could use this:
SELECT ?g ?pred (COUNT(*) as ?triples)
WHERE {
GRAPH ?g {
?s a foaf:Person .
?s ?pred ?o .
}
GROUP BY ?g ?pred
ORDER BY ?g DESC(?triples)
This will list each graph with the predicates in it, in descending order of frequency.

Listing Properties and Values of an Individual on Dbpedia

How can I list properties with their values for any given DBpedia class? I'm new to this and have looked at several other questions on this but I haven't found exactly what I'm looking for.
What I'm trying to do is providing some relevant additional information to topics of conversation I have got from text mining.
Say for example the topic of conversation in a certain community is iPhones. I would like to use this word to query the DBpedia page for this word, IPhone, to get an output such as:
Type: Smartphone
Operating System: IOS
Manufacturer: Foxconn
EDIT:
Using the query from AKSW I can print the p (property?) and o (object?), although I'm still not getting the output I want. Instead of getting something like:
weight: 133.0
I get
http://dbpedia.org/property/weight:133.0
Is there a way to just get the name of the property instead of the DBpedia link?
My Code
Classes do not "have" properties with values. Instances (resp. resources or individuals) do have a relationship via a property to some value which can be an individual itself or a literal (or some anonymous instance aka blank node). And instances belong to a class. e.g. Berlin belongs to the class City
What you want is to get all outgoing values of a given resource in DBpedia:
SELECT * WHERE { <http://dbpedia.org/resource/IPhone> ?p ?o }
Alternatively, you can use SPARQL DESCRIBE, which return the data in forms of an RDF graph resp. a set of RDF triples:
DESCRIBE <http://dbpedia.org/resource/IPhone>
This might also return incoming information because it's not really specified in the W3C recommendation what has to be returned.
As stated by AKSW properties often link to other classes rather than values. If you want all properties and their values, including other classes the the below gives you the label and filters by language (put the language code you need where have put "en").
SELECT DISTINCT ?label ?o
WHERE {
<http://dbpedia.org/resource/IPhone> ?p ?o.
?p <http://www.w3.org/2000/01/rdf-schema#label> ?label .
FILTER(LANG(?label) = "" || LANGMATCHES(LANG(?label), "en"))
}
If you don't want any properties that link to other classes, then you only want datatype properties so this code could help:
SELECT DISTINCT ?label ?o
WHERE {
<http://dbpedia.org/resource/IPhone> ?p ?o.
?p <http://www.w3.org/2000/01/rdf-schema#label> ?label .
?p a owl:DatatypeProperty .
FILTER(LANG(?label) = "" || LANGMATCHES(LANG(?label), "en"))
}
Obviously this gives you far less information and functionality, but it might just be what you're after?
Edit: In reply to your comment, it is also possible to get the labels for the values, using the same technique:
SELECT DISTINCT ?label ?oLabel
WHERE {
<http://dbpedia.org/resource/IPhone> ?p ?o.
?p <http://www.w3.org/2000/01/rdf-schema#label> ?label .
?o <http://www.w3.org/2000/01/rdf-schema#label> ?oLabel
FILTER(LANG(?label) = "" || LANGMATCHES(LANG(?label), "en"))
}
Note that http://www.w3.org/2000/01/rdf-schema#label is often shortened to rdfs:label by defining prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
So you could also do:
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT DISTINCT ?label ?oLabel
WHERE {
<http://dbpedia.org/resource/IPhone> ?p ?o.
?p rdfs:label ?label .
?o rdfs:label ?oLabel
FILTER(LANG(?label) = "" || LANGMATCHES(LANG(?label), "en"))
}
and get exactly the same result but possibly easier to read.

Retrieving the wider dbpedia vocabulary for tagging pictures

I'm trying to develop a tool in JS for tagging pictures, so I need a set of possible "things" from dbpedia. I already tryed to retrieve this way:
select ?s ?l {
?s a owl:Class .
?s rdf:type ?l
FILTER regex(str(?s), "House", "i").
}
http://dbpedia.org/snorql/?query=select+%3Fs+%3Fl+%7B%0D%0A+++%3Fs+a+owl%3AClass+.%0D%0A+++%3Fs+rdf%3Atype+%3Fl%0D%0A+++FILTER+regex%28str%28%3Fs%29%2C+%22House%22%2C+%22i%22%29.%0D%0A%7D
And also this way:
select ?label
WHERE {
?concept a skos:Concept.
?concept skos:prefLabel ?label.
FILTER regex(str(?label), "^House", "i").
}
http://dbpedia.org/snorql/?query=select+%3Flabel+%0D%0AWHERE+%7B%0D%0A++%3Fconcept+a+skos%3AConcept.%0D%0A++%3Fconcept+skos%3AprefLabel+%3Flabel.%0D%0A++FILTER+regex%28str%28%3Flabel%29%2C+%22%5EHouse%22%2C+%22i%22%29.%0D%0A%7D
In the first case, I just have "instances" of the house "thing", but not the "House" class itself. In the second one, I never retrieve the "house" and the similar thing is "houses". Any alternative for retrieving a better vocabulary based in dbpedia dataset?
If you don't bother to restrict yourself to owl:Thing or to skos:Concept, you can just get things that have a label that contains "house". Rather than using regex, I chose to use contains and lcase, since a string containment could be less expensive than invoking a full regular expression processor.
select ?thing ?label where {
?thing rdfs:label ?label .
filter contains(lcase(?label), "house")
}
SPARQL results (limited to 200)