SPARQL query coming up empty - sparql

I'd like to query DBPedia with SPARQL to select car manufacturers (of France to start), but I have no idea how to do this.
I started with this query:
select distinct * where {
?carManufacturer dbpedia-owl:product <http://dbpedia.org/page/Automobile> .
?carManufacturer dbpprop:locationCountry "France"#en
} LIMIT 100
However, it returns an empty result set. What am I doing wrong?

You’re using the wrong URI to identify DBPedia resources. On DBPedia, http://dbpedia.org/resource/Automobile refers to the noun ‘automobile’, while http://dbpedia.org/page/Automobile refers to a page describing the noun ‘automobile’. Thus,
SELECT DISTINCT * WHERE {
?carManufacturer dbpedia-owl:product <http://dbpedia.org/resource/Automobile>.
?carManufacturer dbpprop:locationCountry "France"#en.
} LIMIT 100
works just fine.
However, if you want to be a bit more idiomatic, you can use a bit of syntactic sugar to eliminate the subject repetition in your query. DBPedia also loads http://dbpedia.org/resource/ as prefix dbpedia, so you can actually eliminate all URIs from your query entirely:
SELECT DISTINCT * WHERE {
?carManufacturer dbpedia-owl:product dbpedia:Automobile;
dbpprop:locationCountry "France"#en.
} LIMIT 100

Related

Performance of `OFFSET ... LIMIT ...` query in Virtuoso

I'm trying to query dbpedia on a local installation of Virtuoso (a little over a billion triples), and would like to be able to read the entire thing in pages of about 1000 triples at a time. The following query seemed promising:
SELECT *
WHERE {
?s ?o ?p.
}
LIMIT 1000
OFFSET 10000000
until I realized that queries of this type run in time proportional to the OFFSET value.
Looking into the query plan it seems that queries such as this get translated into SQL that looks like this:
SELECT TOP 100000000, 1 __id2in ( "s_7_2_t0"."S") AS "s",
__id2in ( "s_7_2_t0"."P") AS "o",
__ro2sq ( "s_7_2_t0"."O") AS "p"
FROM DB.DBA.RDF_QUAD AS "s_7_2_t0"
OPTION (QUIETCAST)
which confirms my observation.
Is it possible to run such queries in constant time, either in SPARQL or directly in SQL on the SQL table? Since it's all SQL under the hood I had hoped that it would be a straightforward matter of writing the corresponding SQL query but for some reason the query select * from DB.DBA.RDF_QUAD limit 1; fails with the error syntax error which leaves me more confused than ever.

Fast publication date lookup with Wikidata Query Service

Is there a way to lookup publication dates quickly in Wikidata Query Service's SPARQL to find publications of a certain date, e.g., today?
I was hoping that something like this query would be quick:
SELECT * WHERE {
?work wdt:P577 ?datetime .
BIND("2018-09-28T00:00:00Z"^^<http://www.w3.org/2001/XMLSchema#dateTime> as ?now_datetime)
FILTER (?datetime = ?now_datetime)
}
LIMIT 10
However, it times out when using it on the SPARQL endpoint at https://query.wikidata.org
A range query seems neither to be quick. The query below returns after almost 30 seconds:
SELECT * WHERE {
?work wdt:P577 ?datetime .
FILTER (?datetime > "2018-09-28T00:00:00Z"^^xsd:dateTime)
}
LIMIT 1
The trick is to avoid full scan and use indexes:
VALUES:
SELECT * WHERE {
VALUES (?datetime) {("2018-09-28T00:00:00Z"^^xsd:dateTime)}
?work wdt:P577 ?datetime .
} LIMIT 10
Try it!
hint:rangeSafe:
SELECT * WHERE {
VALUES (?datetime) {("2018-09-28T00:00:00Z"^^xsd:dateTime)}
?work wdt:P577 ?date_time .
hint:Prior hint:rangeSafe true .
FILTER (?date_time > ?datetime)
} LIMIT 10
Try it!
[The rangeSafe hint] declare[s] that the data touched by the query for a specific triple pattern is strongly typed, thus allowing a range filter to be pushed down onto an index.

Count properties in a large data set via SPARQL

I'd like to have a list of most used properties in a SPARQL endpoint. The most straightforward query would be:
select ?p ( count ( distinct * ) as ?ct )
{
?s ?p ?o.
}
group by ?p
order by desc ( ?ct )
limit 1000
The problem is that there are too many triples (1.6 billions) and the server times out. So, after googling, I've also tried this, to get at least a sample statistics (yes, it's Virtuoso-specific and it's fine in my case):
select ?p ( count ( distinct * ) as ?ct )
{
?s ?p ?o.
FILTER ( 1 > <SHORT_OR_LONG::bif:rnd> (0.0001, ?s, ?p, ?o) )
}
group by ?p
order by desc ( ?ct )
limit 1000
But it times out anyway, I guess because it still has to group, count and then order. So, how can I do it? I have access to the Virtuoso relational DB (i.e., iSQL), but I cannot find docs about SQL syntax and how to select random triples from the table db.dba.rdf_quad.
EDIT: I've fixed the queries, initially they were wrong, thanks for the comments. The versions above still don't work.
OK, I've found a way, at least a partial one: Virtuoso has a command line administration tool, isql. This accepts SPARQL queries as well, in the form: SPARQL <query>;. And they're executed without timeout or result size restrictions.
This is still not good if you can only access an endpoint via HTTP, I don't quite know if that way it is possible at all.

How to force virtuoso sparql endpoint return full answer?

I want to query DBpedia and use Virtuoso. In some queries which their results are too much, it returns only part of the results. For example, in the query below, the predicate http://dbpedia.org/ontology/birthplace is missing. Is there any way to get all results either from Virtuoso or any other endpoint ?
SELECT DISTINCT ( ?p AS ?outEdge )
( ?q AS ?inEdge )
( ?px AS ?dest )
( ?qx AS ?source )
WHERE {
{ <http://dbpedia.org/resource/England> ?p ?px . }
UNION
{ ?qx ?q <http://dbpedia.org/resource/England> . }
}
I want to query DBPeida and use virtuoso. In some queries which their results are too much it returns only part of the results for example in the below query the predicate http://dbpedia.org/ontology/birthplace is missing. Is there anyway to get all results either from virtuoso or any other endpoint ?
While I don't detect anything malicious or mean-spirited in your question, you're essentially asking how circumvent DBpedia's defenses against intentional and unintentional denial of service attacks. Internal limits help to ensure that too many resources aren't consumed by any particular query. The right way to get all the results from a SPARQL query, if they aren't all returned at once, is to use limit, offset, and order by, and to use multiple queries. E.g.,
#-- get first 10 results
select ... where ...
order by ?name
limit 10 offset 0
#-- get next 10 results
select ... where ...
order by ?name
limit 10 offset 10
#-- get more resuls
select ... where ...
order by ?name
limit 10 offset 20

How to form SPARQL queries that refers to multiple resources

My question is a followup with my first question about SPARQL here.
My SPARQL query results for Mountain objects are here.
From those results I picked a certain object resource.
Now I want to get values of "is dbpedia-owl:highestPlace of" records for this chosen Mountain object.
That is, names of mountain ranges for which this mountain is highest place of.
This is, as I figure, complex. Not only because I do not know the required syntax, but also I get two objects here.
One of them is Mont Blank Massif which is of type "place".
Another one is Western Alps which is of type "mountain range" - my desired record.
I need record # 2 above but not 1. I know 1 is also relevant but sometimes it doesn't follow same pattern. Sometimes the records appear to be of YAGO type, which can be totally misleading. To be safe, I simply want to discard those records whenever there is type mismatch.
How can I form my SPARQL query to get these "is dbpedia-owl:highestPlace of" records and also have the type filtering?
you can use this query, note however that Mont_Blanc_massif in your example is both a dbpedia-owl:Place and a dbpedia-owl:MountainRange
select * where {
?place dbpedia-owl:highestPlace :Mont_Blanc.
?place rdf:type dbpedia-owl:MountainRange.
}
run query
edit after comment: filter
It is not really clear what you want to filter (yago?), technically you can filter for example like this:
select * where {
?place dbpedia-owl:highestPlace :Mont_Blanc.
?place rdf:type dbpedia-owl:MountainRange.
FILTER NOT EXISTS {
?place ?pred ?obj
Filter (regex(?obj, "yago"))
}
}
this filters out results that have any object with 'yago' in its URL.
Extending the result from the previous answer, the appropriate query would be
select * where {
?mountain a dbpedia-owl:Mountain ;
dbpedia-owl:abstract ?abstract ;
foaf:depiction ?depiction .
?range a dbpedia-owl:MountainRange ;
dbpedia-owl:highestPlace ?mountain .
FILTER(langMatches(lang(?abstract),"EN"))
}
LIMIT 10
SPARQL Results
This selects mountains with English abstracts that have at least one depiction (or else the pattern wouldn't match) and for which there is some mountain range of which the mountain is the highest place. Without the parts from the earlier question, if you just want to retrieve mountains that are the highest place of a range, you can use a query like this:
select * where {
?mountain a dbpedia-owl:Mountain .
?range a dbpedia-owl:MountainRange ;
dbpedia-owl:highestPlace ?mountain .
}
LIMIT 10
SPARQL results