Query to get properties of quantity for a certain entity class - sparql

What I am trying to do is to get properties with quantity property types of a certain class (such as City, Country, Human, River, Region, Mountain, etc). I tried several classes like Country (wd:Q6256) works okay with the query below, but many other classes makes the query to exeed time limit. How can I achieve the result optimizing the query below? or is there any other way to get the properties of Quantity type in a certain class?
SELECT DISTINCT ?p_ ?pLabel ?pAltLabel
WHERE {
VALUES (?class) {(wd:Q515)}
?x ?p_ [].
?x p:P31/ps:P31 ?class.
?p wikibase:claim ?p_.
?p wikibase:directClaim ?pwdt.
?p wikibase:propertyType ?pType.
FILTER (?pType = wikibase:Quantity)
SERVICE wikibase:label { bd:serviceParam wikibase:language "ko,en". }
}

Attempt 1: Optimizing the query
Some observations:
Instead of p:P31/ps:P31, you could use wdt:P31 which is faster by avoiding the two-property hop, but finds only the truthy statements
The expensive part is the call to the label service at the end, as can be seen by commenting that line out by placing # at the start of the line
The query retrieves every claim on every city (many!), gets the properties of the claims (few!), and only removes the duplicates in the end (with DISTINCT)
As a result, the label service is called many times for the same property, once per claim! This is the big problem with the query
This can be avoided by moving the retrieval of properties with the DISTINCT into a subquery, and calling the label service only at the end on the few properties
After that change it should be fast, but is still slow because the query optimiser seems to evaluate the query in the wrong order. Following hints from this page, we can turn the query optimiser off.
This works for me:
SELECT ?p ?pLabel ?pAltLabel {
hint:Query hint:optimizer "None" .
{
SELECT DISTINCT ?p_ {
VALUES ?class { wd:Q515 }
?x wdt:P31 ?class.
?x ?p_ [].
}
}
?p wikibase:claim ?p_.
?p wikibase:propertyType ?pType.
FILTER (?pType = wikibase:Quantity)
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
Attempt 2: Splitting the task into multiple queries
Having learned that the approach above doesn't work for some of the biggest categories (like wd:Q5, “human”), I tried a different approach. This will get all results, but not in a single query. It requires sending ~25 individual queries and combining the results afterwards:
We start by listing the quantity properties. There are, as of today, 503 of them.
We want to keep only those properties that are actually used on an item of type “human”.
Because that check is so slow (it needs to look at millions of items), we start by only checking the first 20 properties from our list.
In the second query, we're going to check the next 20, and so on.
This is the query that tests the first 20 properties:
SELECT DISTINCT ?p ?pLabel ?pAltLabel {
hint:Query hint:optimizer "None" .
{
SELECT ?p ?pLabel ?pAltLabel {
?p wikibase:propertyType wikibase:Quantity.
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
OFFSET 0 LIMIT 20
}
?p wikibase:claim ?p_.
?x ?p_ [].
?x wdt:P31 wd:Q5.
}
Increase the OFFSET to 20, 40, 60 and so on, up to 500, to test all properties.

Related

Return a table of basic physical objects?

How can I return a table of basic physical objects, e.g., Ball (Q18545), Arrow (Q45922), in Wikidata using SPARQL?
I'm not able to directly return objects with the property Physical Object (Q223557) because it has way too many records. But its subtypes, e.g., Toy (Q11422) or Projectile (Q49393), are too narrow for me. I've tried the following to get my broad query working:
removing the label service
using LIMIT for a moderate number of records
filtering out records with very few sitelinks
limiting the objects to those with ids from BNCF Thesaurus ID, BabelNet ID, etc.
Nothing has worked for me. I suspect this is straightforward for anyone who's had more than a few days with Wikidata. Please help.
I shared my wrecked query below.
SELECT ?obj #?objLabel
WHERE {
{
SELECT ?obj WHERE {
?obj wdt:P508 ?bncfid;
wdt:P2581 ?bnid;
wdt:P227 ?gndid;
wdt:P8814 ?wsid;
wdt:P18 ?image;
wikibase:sitelinks ?sitelinks;
wdt:P31/wdt:P279* wd:Q223557.
#FILTER(?sitelinks > 5).
#FILTER(LANG(?objLabel)="en").
#SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en".
# ?obj rdfs:label ?objLabel}
}
LIMIT 1000
}
#?obj rdfs:label ?objLabel
#FILTER(LANG(?objLabel)="en").
}
(Run by clicking here)

How to search a list of Wikidata IDs and return the P31 ("instance of") property using SPARQL?

How do I get the instance type(s) (i.e., property=P31 and associated labels) for multiple Wikidata IDs in a single query? Ideally, I want to output a list with the columns: Wikidata ID | P31 ID | P31 Label, with multiple rows used if a Wikidata ID has more than one P31 attached.
I am using the web query service, which works well in part, but I am struggling to understand the syntax. I have so far managed to work out how to process a list of items, and return each one as a row (simple I know!), but I can't work out how to generate a new column that gives the P31 item:
SELECT ?item
WHERE {
VALUES ?item { wd:Q1347065 wd:Q731635 wd:Q105492052 }
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
I have found the following from a previusly answered question here, which returns multiple rows per an item of interest, but this requires specifying the P31 type at the outset, which is what I am looking to generate.
Any help would be appreciated as I am really stuck understanding the syntax.
Update:
I have now worked out how to return P31s for a single ID. I need to expand this query to receive a list of IDs, and include the ID as a column:
SELECT ?item ?itemLabel
WHERE
{
wd:Q18656 wdt:P31 ?item.
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". }
}
If I correctly understood your problem, you can use the following query:
SELECT ?item ?class ?classLabel
WHERE {
VALUES ?item { wd:Q1347065 wd:Q731635 wd:Q105492052 }
?item wdt:P31 ?class .
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
Here, first you fix the possible values for ?item, then you say that ?item is instance of a certain ?class and contestually you also retrieve the label for such ?class.

How to find relations/properties for SPARQL queries [duplicate]

whenever I start using SQL I tend to throw a couple of exploratory statements at the database in order to understand what is available, and what form the data takes.
e.g.
show tables
describe table
select * from table
Could anyone help me understand the way to complete a similar exploration of an RDF datastore using a SPARQL endpoint?
Well, the obvious first start is to look at the classes and properties present in the data.
Here is how to see what classes are being used:
SELECT DISTINCT ?class
WHERE {
?s a ?class .
}
LIMIT 25
OFFSET 0
(LIMIT and OFFSET are there for paging. It is worth getting used to these especially if you are sending your query over the Internet. I'll omit them in the other examples.)
a is a special SPARQL (and Notation3/Turtle) syntax to represent the rdf:type predicate - this links individual instances to owl:Class/rdfs:Class types (roughly equivalent to tables in SQL RDBMSes).
Secondly, you want to look at the properties. You can do this either by using the classes you've searched for or just looking for properties. Let's just get all the properties out of the store:
SELECT DISTINCT ?property
WHERE {
?s ?property ?o .
}
This will get all the properties, which you probably aren't interested in. This is equivalent to a list of all the row columns in SQL, but without any grouping by the table.
More useful is to see what properties are being used by instances that declare a particular class:
SELECT DISTINCT ?property
WHERE {
?s a <http://xmlns.com/foaf/0.1/Person>;
?property ?o .
}
This will get you back the properties used on any instances that satisfy the first triple - namely, that have the rdf:type of http://xmlns.com/foaf/0.1/Person.
Remember, because a rdf:Resource can have multiple rdf:type properties - classes if you will - and because RDF's data model is additive, you don't have a diamond problem. The type is just another property - it's just a useful social agreement to say that some things are persons or dogs or genes or football teams. It doesn't mean that the data store is going to contain properties usually associated with that type. The type doesn't guarantee anything in terms of what properties a resource might have.
You need to familiarise yourself with the data model and the use of SPARQL's UNION and OPTIONAL syntax. The rough mapping of rdf:type to SQL tables is just that - rough.
You might want to know what kind of entity the property is pointing to. Firstly, you probably want to know about datatype properties - equivalent to literals or primitives. You know, strings, integers, etc. RDF defines these literals as all inheriting from string. We can filter out just those properties that are literals using the SPARQL filter method isLiteral:
SELECT DISTINCT ?property
WHERE {
?s a <http://xmlns.com/foaf/0.1/Person>;
?property ?o .
FILTER isLiteral(?o)
}
We are here only going to get properties that have as their object a literal - a string, date-time, boolean, or one of the other XSD datatypes.
But what about the non-literal objects? Consider this very simple pseudo-Java class definition as an analogy:
public class Person {
int age;
Person marriedTo;
}
Using the above query, we would get back the literal that would represent age if the age property is bound. But marriedTo isn't a primitive (i.e. a literal in RDF terms) - it's a reference to another object - in RDF/OWL terminology, that's an object property. But we don't know what sort of objects are being referred to by those properties (predicates). This query will get you back properties with the accompanying types (the classes of which ?o values are members of).
SELECT DISTINCT ?property, ?class
WHERE {
?s a <http://xmlns.com/foaf/0.1/Person>;
?property ?o .
?o a ?class .
FILTER(!isLiteral(?o))
}
That should be enough to orient yourself in a particular dataset. Of course, I'd also recommend that you just pull out some individual resources and inspect them. You can do that using the DESCRIBE query:
DESCRIBE <http://example.org/resource>
There are some SPARQL tools - SNORQL, for instance - that let you do this in a browser. The SNORQL instance I've linked to has a sample query for exploring the possible named graphs, which I haven't covered here.
If you are unfamiliar with SPARQL, honestly, the best resource if you get stuck is the specification. It's a W3C spec but a pretty good one (they built a decent test suite so you can actually see whether implementations have done it properly or not) and if you can get over the complicated language, it is pretty helpful.
I find the following set of exploratory queries useful:
Seeing the classes:
select distinct ?type ?label
where {
?s a ?type .
OPTIONAL { ?type rdfs:label ?label }
}
Seeing the properties:
select distinct ?objprop ?label
where {
?objprop a owl:ObjectProperty .
OPTIONAL { ?objprop rdfs:label ?label }
}
Seeing the data properties:
select distinct ?dataprop ?label
where {
?dataprop a owl:DatatypeProperty .
OPTIONAL { ?dataprop rdfs:label ?label }
}
Seeing which properties are actually used:
select distinct ?p ?label
where {
?s ?p ?o .
OPTIONAL { ?p rdfs:label ?label }
}
Seeing what entities are asserted:
select distinct ?entity ?elabel ?type ?tlabel
where {
?entity a ?type .
OPTIONAL { ?entity rdfs:label ?elabel } .
OPTIONAL { ?type rdfs:label ?tlabel }
}
Seeing the distinct graphs in use:
select distinct ?g where {
graph ?g {
?s ?p ?o
}
}
SELECT DISTINCT * WHERE {
?s ?p ?o
}
LIMIT 10
I often refer to this list of queries from the voiD project. They are mainly of a statistical nature, but not only. It shouldn't be hard to remove the COUNTs from some statements to get the actual values.
Especially with large datasets, it is important to distinguish the pattern from the noise and to understand which structures are used a lot and which are rare. Instead of SELECT DISTINCT, I use aggregation queries to count the major classes, predicates etc. For example, here's how to see the most important predicates in your dataset:
SELECT ?pred (COUNT(*) as ?triples)
WHERE {
?s ?pred ?o .
}
GROUP BY ?pred
ORDER BY DESC(?triples)
LIMIT 100
I usually start by listing the graphs in a repository and their sizes, then look at classes (again with counts) in the graph(s) of interest, then the predicates of the class(es) I am interested in, etc.
Of course these selectors can be combined and restricted if appropriate. To see what predicates are defined for instances of type foaf:Person, and break this down by graph, you could use this:
SELECT ?g ?pred (COUNT(*) as ?triples)
WHERE {
GRAPH ?g {
?s a foaf:Person .
?s ?pred ?o .
}
GROUP BY ?g ?pred
ORDER BY ?g DESC(?triples)
This will list each graph with the predicates in it, in descending order of frequency.

Apparently simple Wikidata SPARQL filter query is slow

Absolute Wikidata and SPARQL beginner here. I am trying to find out the Q code of a particular female name, say Jennifer. I can get it with a query like this:
SELECT ?name WHERE {
?name wdt:P31 wd:Q11879590.
?name rdfs:label ?label.
FILTER((STR(?label)) = "Jennifer")
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
LIMIT 1
That is, I look up entities that are instance of "female given name" and then filter to those with "Jennifer" in the label. It works, but it takes 5 s or more.
If I omit the LIMIT 1 I get many instances of the same results, which signals to me that I am doing something stupid.
Bottom line, is there an efficient way to look up the Q code for a "female given name"?

How to get only the most recent value from a Wikidata property?

Suppose I want to get a list of every country (Q6256) and its most recently recorded Human Development Index (P1081) value. The Human Development Index property for the country contains a list of data points taken at different points in time, but I only care about the most recent data. This query will not work because it gets multiple results for each country (one for each Human Development Index data point):
SELECT
?country
?countryLabel
?hdi_value
?hdi_date
WHERE {
?country wdt:P31 wd:Q6256.
OPTIONAL { ?country p:P1081 ?hdi_statement.
?hdi_statement ps:P1081 ?hdi_value.
?hdi_statement pq:P585 ?hdi_date.
}
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
Link to Query Console
I'm aware of GROUP BY/GROUP CONCAT but that will still give me every result when I'd prefer to just have one. GROUP BY/SAMPLE will also not work since SAMPLE is not guaranteed to take the most recent result.
Any help or link to a relevant example query is appreciated!
P.S. Another thing I'm confused about is why population P1082 in this query returns only one population result per country
SELECT
?country
?countryLabel
?population
WHERE {
?country wdt:P31 wd:Q6256.
OPTIONAL { ?country wdt:P1082 ?population. }
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
while the same query but for HDI returns multiple results per country:
SELECT
?country
?countryLabel
?hdi
WHERE {
?country wdt:P31 wd:Q6256.
OPTIONAL { ?country wdt:P1081 ?hdi. }
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
What is different about population and HDI that causes the behavior to be different? When I view the population data for each country on Wikidata I see multiple population points listed, but only one gets returned by the query.
Both your questions are duplicates, but I'll try to add interesting facts to existing answers.
Question 1 is a duplicate of SPARQL query to get only results with the most recent date.
This technique does the trick:
FILTER NOT EXISTS {
?country p:P1081/pq:P585 ?hdi_date_ .
FILTER (?hdi_date_ > ?hdi_date)
}
However, you should add this clause outside of OPTIONAL, it is not working inside of OPTIONAL (and I'm not sure this is not a bug).
Question 2 is a duplicate of Some cities aren't instances of city or big city?
You can't use wdt-predicates, because missing statements are not truthy.
They are normal-rank statements, but there is a preferred-rank statement.
Truthy statements represent statements that have the best non-deprecated rank for given property. Namely, if there is a preferred statement for property P2, then only preferred statements for P2 will be considered truthy. Otherwise, all normal-rank statements are considered truthy.
The reason why P1081 always has preferred statement is that this property is processed by PreferentialBot.