SPARQL query for average of time duration - sparql

How is it possible to compute in SPARQL the average of time durations? I am using durations expressed in xsd:duration e.g., "PT4H15M53S"^^xsd:duration
The following query does not return anything:
SELECT (AVG(?Duration) AS ?avg)
WHERE {
?Time rdf:type tl:Interval ;
tl:duration ?Duration .
}

It may not be very clean, but it is nonetheless compatible with standard SPARQL, if you use regular expressions to extract the components of the duration:
SELECT * WHERE {
BIND("P20DT14H15M53S" AS ?duration)
BIND(IF(STRSTARTS(?duration, "-"), -1, 1) AS ?coef)
BIND(COALESCE(xsd:integer(REPLACE(?duration, "^.*[A-Z](\\d+)D.*$", "$1")), 0) AS ?d)
BIND(COALESCE(xsd:integer(REPLACE(?duration, "^.*T(\\d+)H.*$", "$1")), 0) AS ?h)
BIND(COALESCE(xsd:integer(REPLACE(?duration, "^.*T(?:|.*[A-Z])(\\d+)M.*$", "$1")), 0) AS ?m)
BIND(COALESCE(xsd:decimal(REPLACE(?duration, "^.*T(?:|.*[A-Z])(\\d*\\.?\\d*)S.*$", "$1")), 0) AS ?s)
BIND(?coef * ((((?d * 24 + ?h) * 60) + ?m) * 60 + ?s) AS ?total)
}
This converts the duration into the total number of seconds, which you can then use in any calculations you wish, and then convert that to duration in seconds.
Note that durations with month-based components will not be taken into account, since average is not defined for that subset (you cannot have fractional months). In other words, this only makes sense for values of xs:dayTimeDuration.

Related

Multiple filters in SELECT query using rd4j

I am trying to filter through the start date and end date using multiple filters within one query.
I want to find how many terms are within 3 months of ending, so the SPARQL query I am trying to recreate is:
SELECT (COUNT(?term) AS ?count) WHERE { \
?term a :Term . \
?term :startDate ?startDate . \
?term :endDate ?endDate . \
FILTER(?endDate < NOW() + "P3M"^^xsd:duration && ?endDate >= NOW() && ?startDate < NOW()) }
This is what I have so far, and I'm also not sure how to include the +3months within the query.
The graph pattern and other variables have been initialised and are not causing a problem.
Variable count = SparqlBuilder.var("count");
Aggregate countAgg = Expressions.count(contract);
Projection select = SparqlBuilder.select(countAgg.as(count));
Expression<?> nowFunc = Expressions.function(SparqlFunction.NOW );
SelectQuery activeQuery = Queries.SELECT().prefix(lg).
select(countAgg.as(count)).
where((activePattern.filter(Expressions.gte(endDate, nowFunc))).and(activePattern.filter(Expressions.lt(startDate, nowFunc))));
I get a "java.lang.StackOverflowError". I thought I'd try to include the +3 months by using the expression below, but it doesn't work as it is the function that returns the month of an argument.
Expression<?> threeMonths = Expressions.fucntion(SparqlFunction.MONTH):
EDIT (20/05/2020): The stack overflow error is not the problem, and has been tested. It is being thrown because the query is not being built properly, and both filters are not being applied. The main issue is I can't figure out how to apply two filters within one query. This is the code that has been included:
SelectQuery activeQuery = Queries.SELECT().prefix(lg).select(countAgg.as(count)).where(activePattern.filter(Expressions.gt(endDate, nowFunc)), activePattern.filter(Expressions.lt(startDate, nowFunc)));
and this is query that shows up:
SELECT ( COUNT( ?term ) AS ?count )
WHERE { ?term a Term .
?term :contractEndDate ?endDate .
?term :contractStartDate ?startDate .
FILTER ( ?startDate < NOW() ) }

SPARQL Restrict Number of Results for Specific Variable

Suppose I want to look for some first degree neighbors of Berlin. I ask the following query:
select ?s ?p where {
?s ?p dbr:Berlin.
}
Is it possible to put a restriction on the return result, such that there are at most 5 results for each unique value of ?p?
My attempts with subqueries all time out...
But, as potentially useful if not exactly perfect solution, maybe GROUP_CONCAT, MAX/MIN or SAMPLE are of use?
SELECT
?writer (GROUP_CONCAT(?namestring; SEPARATOR = " ") AS ?namestrings)
(MIN(?namestring) AS ?min_name)
(MAX(?namestring) AS ?max_name)
(SAMPLE(?namestring) AS ?random_name)
(SAMPLE(?namestring) AS ?another_random_name_that_may_unfortunately_be_the_same_again)
WHERE {
?writer wdt:P31 wd:Q5;
wdt:P166 wd:Q37922;
wdt:P735 ?firstname.
?firstname wdt:P1705 ?namestring.
}
GROUP BY ?writer
HAVING ((COUNT(?writer)) > 2 )
LIMIT 20
See it live here.
And, as you can see, SAMPLE is apparently evaluated only once, so using it repeatedly does not get you closer to five (different) samples.
(You can leave out the HAVING for your use. I only included it to restrict it to useful examples))

Aggregate functions in Sparql query with empty records

I've been trying to run a sparql query against https://landregistry.data.gov.uk/app/qonsole# to yield some sold properties result.
The query is the following:
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX ukhpi: <http://landregistry.data.gov.uk/def/ukhpi/>
SELECT sum(?ukhpi_salesVolume)
WHERE
{ { SELECT ?ukhpi_refMonth ?item
WHERE
{ ?item ukhpi:refRegion <http://landregistry.data.gov.uk/id/region/haringey> ;
ukhpi:refMonth ?ukhpi_refMonth
FILTER ( ?ukhpi_refMonth >= "2019-03"^^xsd:gYearMonth )
FILTER ( ?ukhpi_refMonth < "2020-03"^^xsd:gYearMonth )
}
}
OPTIONAL
{ ?item ukhpi:salesVolume ?ukhpi_salesVolume }
}
The problem is, the result from this is empty. However, if i run the same query without the SUM on the 4th line, i can see there are 11 integer records.
My thoughts are that there is a 12th, empty record which causes all the issues in the SUM operation, but sparql is not my storngest side so i'm not sure how to filter this (and remove any empty records) if that's really the problem.
I've also noticed that most of the aggregate functions do not work as well(min, max, avg). *Count does, and returns 11
I actually solved this myself, all that was needed was a coalesce which apparently existed in sparql too.
So:
SELECT sum(COALESCE(?ukhpi_salesVolume, 0))
instead of just
SELECT sum(?ukhpi_salesVolume)

Remove time from datetime in spaqrl

I am writing a query in SPARQL and I want to compare the date value without the time. Currently, I am getting a datetime value such as 2014-08-14T13:00:00Z. However, I want to do a filter on the date such as
FILTER (?date = "2014-08-15"^^xsd:dateTime)
I am new to SPARQL, so I need some help. Thanks.
EDITED
Thanks for the response guys. My apologies for the xsd:dateTime
FILTER (?date = "2014-08-15"^^xsd:date)
I have decided to try the following although I wanted a much 'prettier' solution.
FILTER (?date >= "2014-08-15T00:00:00Z"^^xsd:dateTime && ?date <= "2014-08-15T24:00:00Z"^^xsd:dateTime)
You can do this, but not exactly the way your question asks. The literal forms for dateTimes have to have all the fields (except the timezone), so "2014-08-15"^^xsd:dateTime isn't actually a legal dateTime. See the definition for more about the date time format.
That said, it's easy enough to pull out the year, month, and day from a datetime and put them back together into a date that you can compare with other dates:
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
select ?dt ?date where {
values ?dt { "2011-01-10T14:45:13.815-05:00"^^xsd:dateTime }
bind(xsd:date(concat(str(year(?dt)),"-",
str(month(?dt)),"-",
str(day(?dt))))
as ?date)
}
--------------------------------------------------------------------------
| dt | date |
==========================================================================
| "2011-01-10T14:45:13.815-05:00"^^xsd:dateTime | "2011-01-10"^^xsd:date |
--------------------------------------------------------------------------
If you want to include the timezone, you can do that too; they're permitted in xsd:dates.
If you wanted to filter without creating the new date, you could also do something like
filter (year(?dt) = 2015 &&
month(?dt) = 01 &&
day(?dt) = 10)
That might be a fairly clean solution.
A note about your filter, though. You can filter the value of a variable against a constant like you did, but that often (but not always) suggests an easier way. For instance, instead of:
select ?s where {
?s a ?o .
filter ( ?o = <something> )
}
you'd usually just use the value in place, or use values to specify the value of a variable:
select ?s where {
?s a <something> .
}
select ?s where {
values ?o { <something> }
?s a ?o .
}
You could try a simple cast to xsd:date, but the SPARQL engine will likely retain the time. So it becomes a matter of parsing. One way is to just use SUBSTR, as the number of characters is known:
FILTER (xsd:date(SUBSTR(str(?date), 0, 11)) = "2014-08-15"^^xsd:date)
Another is to build the date from datetime format:
FILTER (xsd:date(CONCAT(str(YEAR(?date)), "-", str(MONTH(?date)), "-", str(DAY(?date)))) = "2014-08-15"^^xsd:date)
Perhaps not as convenient given that CONCAT requires string conversion, but the general idea is to build the string from the datetime value and cast to xsd:date.

SPARQL group by a substring and average

I am querying a large data set (temperatures recorded hourly for nearly 20 years) and I'd rather get a summary, e.g. daily temperatures.
An example query is here:
http://www.boisvert.me.uk/opendata/sparql_aq+.html?pasteid=hu5rbc7W
PREFIX opensheff: <uri://opensheffield.org/properties#>
select ?time ?temp where {
?m opensheff:sensor <uri://opensheffield.org/datagrid/sensors/Weather_Mast/Weather_Mast.ic> ;
opensheff:rawValue ?temp ;
<http://purl.oclc.org/NET/ssnx/ssn#endTime> ?time .
FILTER (str(?time) > "2011-09-24")
}
ORDER BY ASC(?time)
And the results look like this:
time temp
"2011-09-24T00:00Z" 12.31
"2011-09-24T01:00Z" 11.68
"2011-09-24T02:00Z" 11.92
"2011-09-24T03:00Z" 11.59
Now I would like to group by a part of the date string, so as to get a daily average temperature:
time temp
"2011-09-24" 12.3 # or whatever
"2011-09-23" 11.7
"2011-09-22" 11.9
"2011-09-21" 11.6
So, how do I group by a substring of ?time ?
Eventually solved it. Running here:
http://www.boisvert.me.uk/opendata/sparql_aq+.html?pasteid=j8m0Qk6s
Code:
PREFIX opensheff:
select ?d AVG(?temp) as ?day_temp
where {
?m opensheff:sensor <uri://opensheffield.org/datagrid/sensors/Weather_Mast/Weather_Mast.ic> ;
opensheff:rawValue ?temp ;
<http://purl.oclc.org/NET/ssnx/ssn#endTime> ?time .
BIND( SUBSTR(?time, 1, 10) AS ?d ) .
}
GROUP BY ?d
ORDER BY ASC(?d)
We use BIND to set a new variable to the substring required, and then grouping and averaging by that variable is simple enough.