Querying for date range in SPARQL - sparql

I have some data in a semantic database that looks like the following, where the first column is the ID of an object, and the second column is the last modified date, as xsd:dateTime's.
?s ?last_mod_date
http://company.com/custom.xml#obj1, 2016-08-30T08:44:49.000-04:00
http://company.com/custom.xml#obj2, 2016-08-30T17:24:21.000-04:00
http://company.com/custom.xml#obj3, 2016-08-30T09:03:57.000-04:00
http://company.com/custom.xml#obj4, 2016-07-27T03:26:44.000-04:00
http://company.com/custom.xml#obj5, 2016-08-11T03:23:53.000-04:00
http://company.com/custom.xml#obj6, 2016-07-19T03:05:03.000-04:00
I'm trying to filter this list of objects down to one item by date; my query input is unfortunately only precise to the minute, so I'm trying to use a date range to find the object, like this:
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix cust: <http://company.com/custom.xml#>
SELECT ?s ?date WHERE
{
?s cust:last_mod_date ?date.
BIND("2016-08-30T09:03:00.000-0400"^^<http://www.w3.org/2001/XMLSchema#dateTime> as ?minDate).
BIND("2016-08-30T09:04:00.000-0400"^^<http://www.w3.org/2001/XMLSchema#dateTime> as ?maxDate).
FILTER(?date > ?minDate && ?date < ?maxDate)
}
The above query should find obj3, but instead it finds nothing. This is with a Sesame semantic database. Any ideas why this would be?

Your datetimes in the SPARQL query are malformed:
BIND("2016-08-30T09:03:00.000-0400"^^<http://www.w3.org/2001/XMLSchema#dateTime> as ?minDate).
BIND("2016-08-30T09:04:00.000-0400"^^<http://www.w3.org/2001/XMLSchema#dateTime> as ?maxDate).
Should be
BIND("2016-08-30T09:03:00.000-04:00"^^<http://www.w3.org/2001/XMLSchema#dateTime> as ?minDate).
BIND("2016-08-30T09:04:00.000-04:00"^^<http://www.w3.org/2001/XMLSchema#dateTime> as ?maxDate).
The timezone modifier is the first BIND statements are missing a colon.

Related

DBPedia: all population fields?

I am trying to extract all entities in DBPedia that have a population. However, I have found that there are different field names for population depending on the entity. For instance, http://dbpedia.org/page/Boston has the field populationTotal while http://dbpedia.org/page/Alaska has the field 2010pop. Is there a complete list of the population fields that I can query for?
Solution via #AKSW above: query for all properties that start with "pop" and have a specified range.
SELECT ?p ?range {
?p a rdf:Property
FILTER(regex(str(?p), "pop"))
OPTIONAL {?p rdfs:range ?range}
}

SPARQL query returns multiple birth dates for same person

I am learning SPARQL and dbpedia by working through the queries in https://www.joe0.com/2014/09/22/how-to-use-sparql-to-query-dbpedia-and-freebase/ . I am testing a query to return John Lennon's date of birth and I am running my queries in http://dbpedia.org/sparql . The query is:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dbpedia-owl: <http://dbpedia.org/ontology/>
SELECT DISTINCT ?x0 ?x1 WHERE {
?x0 rdf:type foaf:Person.
?x0 rdfs:label "John Lennon"#en.
?x0 dbpedia-owl:birthDate ?x1.
}
It returns two rows containing the same date (9 Oct 1940). My question is: why does the query return two rows even though it uses DISTINCT? Prior to asking this question I checked the following:
Why does my SPARQL query duplicate results?
Duplicate rows when making SPARQL queries
but I don't think they explain the duplicate dates.
Edit: I converted the results to text and pasted them below
-------------------------------------- -----------------------------------------------------
x0 x1
--------------------------------------- -----------------------------------------------------
http://dbpedia.org/resource/John_Lennon 1940-10-09
http://dbpedia.org/resource/John_Lennon "1940-10-9"^^<http://www.w3.org/2001/XMLSchema#date>
As stated it seems dbpedia actually has two dates, 1940-10-09 (valid) and 1940-10-9 (invalid). The answer is to add a FILTER that converts the date to a string and only allows dates conforming to YYYY-MM-DD. Anyway it works!
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dbpedia-owl: <http://dbpedia.org/ontology/>
SELECT DISTINCT ?x0 ?x1 STR(?x1) WHERE {
?x0 rdf:type foaf:Person.
?x0 rdfs:label "John Lennon"#en.
?x0 dbpedia-owl:birthDate ?x1.
FILTER (REGEX(STR(?x1),"[0-9]{4}-[0-9]{2}-[0-9]{2}")).
}
Well, it is not your fault! Simply the resource has both of these triples as you can see here. There are duplicates in the data.
I ran your query on the DBpedia endpoint and asked for the results in an RDF-based format (Turtle), and found that the lexical forms of the date literals are actually different:
"1940-10-09"^^xsd:date
"1940-10-9"^^xsd:date
The second isn't actually a legal xsd:date. The first is, which is probably why the SPARQL endpoint prints it in "pretty" fashion in the HTML table (as just 1940-10-09).
The result is a slowdown on queries because each access to an invalid date trig an exception (for example, with a query from fuseki) or the filter do the job to eliminate the wrong date, but it's costly

Getting a list of available hierarchies from statistics.gov.scot

I'm interested in obtaining a list of available distinct hierarchies from statistics.gov.scot. The best-fit hierarchies, which I would like to list, are as follow:
http://statistics.gov.scot/def/hierarchy/best-fit#community-health-partnership
http://statistics.gov.scot/def/hierarchy/best-fit#council-area
http://statistics.gov.scot/def/hierarchy/best-fit#country
As available through API section of this sample geography.
Desired results
I would like for the desired results to return:
community-health-partnership
council-area
country
How can I construct query that would actually produce that, I can get a list of available all geographies via:
PREFIX sdmx: <http://purl.org/linked-data/sdmx/2009/dimension#>
SELECT DISTINCT ?framework
WHERE {
?a sdmx:refArea ?framework .
} LIMIT 10
I was trying something on the lines:
PREFIX fits: <http://statistics.gov.scot/def/hierarchy/best-fit#>
SELECT DISTINCT ?framework
WHERE {
?a fits ?framework .
} LIMIT 10
but naturally this syntax is not correct.
Starting on their SPARQL endpoint, you could do something like this --
DESCRIBE <http://statistics.gov.scot/def/hierarchy/best-fit#country>
Then, based on those results, you might try something like this, which results aren't exactly what you say you want, but might be better --
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?hierarchy
?label
WHERE
{ ?hierarchy rdfs:subPropertyOf <http://statistics.gov.scot/def/hierarchy/best-fit>
; rdfs:label ?label
}

SPARQL filter results dates only

I'm trying to find all results that are dates, regardless of the properties they're describing. This FILTER query gives me the results I want:
PREFIX mydb: <http://mydb.org/schema#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT ?s ?p ?date
WHERE {
?s ?p ?date .
FILTER (?date > "1800-01-01"^^xsd:date)
}
But it only works because I set a bottom limit earlier than my earliest date. Is there a way to use a boolean filter for the xsd:date datatype, similar to isURI()?
FILTER ( datatype(?date) = xsd:date ) is the filter I needed.
Thanks to Stanislav Kralin for his comment.

Good SPARQL query to find all triples with a resource as subject or object

I need to find all triples on DBpedia where http://dbpedia.org/resource/Benin is a subject or object. This query gives me the output that I want in a format that works the best for me (just three variables and no blank spaces):
PREFIX : <http://dbpedia.org/resource/>
SELECT * WHERE {
?s ?p ?o
FILTER (?s=:Benin OR ?o=:Benin)
}
I get similar results if I have this query:
PREFIX : <http://dbpedia.org/resource/>
SELECT * WHERE {
{:Benin ?p ?o}
UNION
{?s ?p :Benin}
}
However, the formatting of the latter is off. It first gives me p and o output leaving s blank and then s and p leaving o blank. Also, the first query takes more time to execute. I will be grateful for an explanation of the mechanics of how the two queries work and why there is a difference in the output.
However, the formatting of the latter is off
That's because both queries have different result sets together with SELECT *. The union joins the tuples, but since some tuples are missing parts, you get skewed output.
You can resolve the problem by explicitly listing and selecting the variables:
PREFIX : <http://dbpedia.org/resource/>
SELECT ?s ?p ?o WHERE {
{
?s ?p ?o
FILTER (?s=:Benin)
}
UNION
{
?s ?p ?o .
FILTER (?o=:Benin)
}
}
Note that this is still much faster on dbpedia than the OR filter.
The union will return duplicates when a tuple matches both filter expressions (i.e. :Benin ?p :Benin).
SELECT DISTINCT would remedy that at additional cost and since it looks like the problem is non-existent, I omitted it for improved performance.
Also, the first query takes more time to execute.
That's hard to say without the result of an EXPLAIN(), but my first guess would be that the equality filter is using the index, while the OR filter is using a full table scan. Virtuoso does not seem to generate good query plans for nested filters.
Try this --
PREFIX : <http://dbpedia.org/resource/>
DESCRIBE :Benin
-- or just --
DESCRIBE <http://dbpedia.org/resource/Benin>
You can get the output in various other serializations, including N-triples.