SPARQL group by a substring and average - sparql

I am querying a large data set (temperatures recorded hourly for nearly 20 years) and I'd rather get a summary, e.g. daily temperatures.
An example query is here:
http://www.boisvert.me.uk/opendata/sparql_aq+.html?pasteid=hu5rbc7W
PREFIX opensheff: <uri://opensheffield.org/properties#>
select ?time ?temp where {
?m opensheff:sensor <uri://opensheffield.org/datagrid/sensors/Weather_Mast/Weather_Mast.ic> ;
opensheff:rawValue ?temp ;
<http://purl.oclc.org/NET/ssnx/ssn#endTime> ?time .
FILTER (str(?time) > "2011-09-24")
}
ORDER BY ASC(?time)
And the results look like this:
time temp
"2011-09-24T00:00Z" 12.31
"2011-09-24T01:00Z" 11.68
"2011-09-24T02:00Z" 11.92
"2011-09-24T03:00Z" 11.59
Now I would like to group by a part of the date string, so as to get a daily average temperature:
time temp
"2011-09-24" 12.3 # or whatever
"2011-09-23" 11.7
"2011-09-22" 11.9
"2011-09-21" 11.6
So, how do I group by a substring of ?time ?

Eventually solved it. Running here:
http://www.boisvert.me.uk/opendata/sparql_aq+.html?pasteid=j8m0Qk6s
Code:
PREFIX opensheff:
select ?d AVG(?temp) as ?day_temp
where {
?m opensheff:sensor <uri://opensheffield.org/datagrid/sensors/Weather_Mast/Weather_Mast.ic> ;
opensheff:rawValue ?temp ;
<http://purl.oclc.org/NET/ssnx/ssn#endTime> ?time .
BIND( SUBSTR(?time, 1, 10) AS ?d ) .
}
GROUP BY ?d
ORDER BY ASC(?d)
We use BIND to set a new variable to the substring required, and then grouping and averaging by that variable is simple enough.

Related

SPARQL query: OR in FILTER?

I would like to search court cases based on their short title, but I've noticed in the RDF records that this information is sometimes stored under one property (cdm:expression_case-law_parties) and sometimes under another (cdm:expression_title_alternative). I would like to filter on both simultaneously. The below query, where I'm trying to use an OR || in the FILTER) does not work. What is the appropriate way?
PREFIX cdm: <http://publications.europa.eu/ontology/cdm#>
SELECT ?work ?expression ?ecli ?celex ?alttitle ?parties ?title
WHERE {
?work a ?class.
?expression cdm:expression_belongs_to_work ?work.
?expression cdm:expression_title ?title.
?expression cdm:expression_uses_language <http://publications.europa.eu/resource/authority/language/ENG>.
?work cdm:case-law_ecli ?ecli.
?work cdm:resource_legal_id_celex ?celex.
OPTIONAL{?expression cdm:expression_case-law_parties ?parties}
OPTIONAL{?expression cdm:expression_title_alternative ?alttitle}
FILTER(?class in (<http://publications.europa.eu/ontology/cdm#judgement>))
FILTER CONTAINS (?alttitle, "France v Commission") || (?parties, "France v Commission")}
LIMIT 15
From Stanislav Kralin's comment:
FILTER (CONTAINS (?alttitle, "France v Commission") || CONTAINS(?parties, "France v Commission"))

SPARQL Restrict Number of Results for Specific Variable

Suppose I want to look for some first degree neighbors of Berlin. I ask the following query:
select ?s ?p where {
?s ?p dbr:Berlin.
}
Is it possible to put a restriction on the return result, such that there are at most 5 results for each unique value of ?p?
My attempts with subqueries all time out...
But, as potentially useful if not exactly perfect solution, maybe GROUP_CONCAT, MAX/MIN or SAMPLE are of use?
SELECT
?writer (GROUP_CONCAT(?namestring; SEPARATOR = " ") AS ?namestrings)
(MIN(?namestring) AS ?min_name)
(MAX(?namestring) AS ?max_name)
(SAMPLE(?namestring) AS ?random_name)
(SAMPLE(?namestring) AS ?another_random_name_that_may_unfortunately_be_the_same_again)
WHERE {
?writer wdt:P31 wd:Q5;
wdt:P166 wd:Q37922;
wdt:P735 ?firstname.
?firstname wdt:P1705 ?namestring.
}
GROUP BY ?writer
HAVING ((COUNT(?writer)) > 2 )
LIMIT 20
See it live here.
And, as you can see, SAMPLE is apparently evaluated only once, so using it repeatedly does not get you closer to five (different) samples.
(You can leave out the HAVING for your use. I only included it to restrict it to useful examples))

Aggregate functions in Sparql query with empty records

I've been trying to run a sparql query against https://landregistry.data.gov.uk/app/qonsole# to yield some sold properties result.
The query is the following:
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX ukhpi: <http://landregistry.data.gov.uk/def/ukhpi/>
SELECT sum(?ukhpi_salesVolume)
WHERE
{ { SELECT ?ukhpi_refMonth ?item
WHERE
{ ?item ukhpi:refRegion <http://landregistry.data.gov.uk/id/region/haringey> ;
ukhpi:refMonth ?ukhpi_refMonth
FILTER ( ?ukhpi_refMonth >= "2019-03"^^xsd:gYearMonth )
FILTER ( ?ukhpi_refMonth < "2020-03"^^xsd:gYearMonth )
}
}
OPTIONAL
{ ?item ukhpi:salesVolume ?ukhpi_salesVolume }
}
The problem is, the result from this is empty. However, if i run the same query without the SUM on the 4th line, i can see there are 11 integer records.
My thoughts are that there is a 12th, empty record which causes all the issues in the SUM operation, but sparql is not my storngest side so i'm not sure how to filter this (and remove any empty records) if that's really the problem.
I've also noticed that most of the aggregate functions do not work as well(min, max, avg). *Count does, and returns 11
I actually solved this myself, all that was needed was a coalesce which apparently existed in sparql too.
So:
SELECT sum(COALESCE(?ukhpi_salesVolume, 0))
instead of just
SELECT sum(?ukhpi_salesVolume)

SPARQL for unique set of values from all columns and rows

I have a query that returns several columns, i.e.:
SELECT ?a ?b ?c
WHERE { ... }
Every column variable is an IRI. Obviously this returns a unique row for every combination of column values (note that values may not be unique to a column):
<urn:id:x:1> <urn:id:a:2> <urn:id:j:3>
<urn:id:x:1> <urn:id:a:2> <urn:id:j:4>
<urn:id:x:1> <urn:id:j:4> <urn:id:k:5>
<urn:id:y:2> <urn:id:j:4> <urn:id:k:6>
...
However, all I need are the unique IRIs spanning all rows and columns. i.e.:
<urn:id:x:1>
<urn:id:a:2>
<urn:id:j:3>
<urn:id:j:4>
<urn:id:k:5>
<urn:id:y:2>
<urn:id:k:6>
...
Is it possible to achieve this using SPARQL, or do I need to post-process the results to merge and de-deduplicate the values? Order is unimportant.
SELECT DISTINCT ?d {
...
VALUES ?i { 1 2 3 }
BIND (if(?i=1, ?a, if(?i=2, ?b, ?c)) AS ?d)
}
What does this do?
The VALUES clause creates three copies of each solution and numbers them with a variable ?i
The BIND clause creates a new variable ?d whose value is ?a, ?b or ?c, depending on whether ?i is 1, 2 or 3 in the given solution
The SELECT DISTINCT ?d returns only ?d and removes duplicates

Remove time from datetime in spaqrl

I am writing a query in SPARQL and I want to compare the date value without the time. Currently, I am getting a datetime value such as 2014-08-14T13:00:00Z. However, I want to do a filter on the date such as
FILTER (?date = "2014-08-15"^^xsd:dateTime)
I am new to SPARQL, so I need some help. Thanks.
EDITED
Thanks for the response guys. My apologies for the xsd:dateTime
FILTER (?date = "2014-08-15"^^xsd:date)
I have decided to try the following although I wanted a much 'prettier' solution.
FILTER (?date >= "2014-08-15T00:00:00Z"^^xsd:dateTime && ?date <= "2014-08-15T24:00:00Z"^^xsd:dateTime)
You can do this, but not exactly the way your question asks. The literal forms for dateTimes have to have all the fields (except the timezone), so "2014-08-15"^^xsd:dateTime isn't actually a legal dateTime. See the definition for more about the date time format.
That said, it's easy enough to pull out the year, month, and day from a datetime and put them back together into a date that you can compare with other dates:
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
select ?dt ?date where {
values ?dt { "2011-01-10T14:45:13.815-05:00"^^xsd:dateTime }
bind(xsd:date(concat(str(year(?dt)),"-",
str(month(?dt)),"-",
str(day(?dt))))
as ?date)
}
--------------------------------------------------------------------------
| dt | date |
==========================================================================
| "2011-01-10T14:45:13.815-05:00"^^xsd:dateTime | "2011-01-10"^^xsd:date |
--------------------------------------------------------------------------
If you want to include the timezone, you can do that too; they're permitted in xsd:dates.
If you wanted to filter without creating the new date, you could also do something like
filter (year(?dt) = 2015 &&
month(?dt) = 01 &&
day(?dt) = 10)
That might be a fairly clean solution.
A note about your filter, though. You can filter the value of a variable against a constant like you did, but that often (but not always) suggests an easier way. For instance, instead of:
select ?s where {
?s a ?o .
filter ( ?o = <something> )
}
you'd usually just use the value in place, or use values to specify the value of a variable:
select ?s where {
?s a <something> .
}
select ?s where {
values ?o { <something> }
?s a ?o .
}
You could try a simple cast to xsd:date, but the SPARQL engine will likely retain the time. So it becomes a matter of parsing. One way is to just use SUBSTR, as the number of characters is known:
FILTER (xsd:date(SUBSTR(str(?date), 0, 11)) = "2014-08-15"^^xsd:date)
Another is to build the date from datetime format:
FILTER (xsd:date(CONCAT(str(YEAR(?date)), "-", str(MONTH(?date)), "-", str(DAY(?date)))) = "2014-08-15"^^xsd:date)
Perhaps not as convenient given that CONCAT requires string conversion, but the general idea is to build the string from the datetime value and cast to xsd:date.