SPARQL filter xsd:Date fails - sparql

I have the following SPARQL query
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT DISTINCT ?d
WHERE { GRAPH <https:/my/triples>{
?s <http://my/ontology/v1.0#hasTimestamp> ?d .
FILTER(?d > "1945-01-01"^^xsd:date)
}
}
When i execute the above, it fails to filter correctly the results by date. Actually it gives me no results at all, and no errors.
If i remove the date filter i get this response:
"bindings": [
{
"s": { "type": "uri" , "value": "seo:S2A_MSI_2019_11_28_09_33_31_T33SXB_t_dogliotti" } ,
"d": { "datatype": "xsd:date" , "type": "typed-literal" , "value": "2019-11-28" }
}
What could be wrong?

I tried all of the suggestions in the comments but i couldn't make it work properly.
I stumbled upon a post from 2011 (!!) that partially gave me the solution.
The post is here and it states that the below solution (aka. "stringifying" the date works only for equality
FILTER(str(?d) >= "2018") .
and casting to xsd:dateTime works for comparisons.
In my case the first solution worked for comparisons (including equality), while casting fails miserably.

Related

Numbers returned by MarkLogic do not use scientific notation

Write the following RDF data into MarkLogic:
<http://John> <http://have> 2.1123519E7 .
Then execute the following query:
SELECT *
WHERE { ?s ?p ?o. }
The query result is:
?s
?p
?o
<http://John>
<http://have>
"21123519"^^xsd:double
Expected query result:
The expected query result is:
?s
?p
?o
<http://John>
<http://have>
"2.1123519E7"^^xsd:double
Alternatives
The written data is expressed in scientific notation, but the data returned in query result is not in scientific notation. Will there be some inconsistencies?
Executing the SparQL query with Java Client API will get the unexpected query result. But executing it in Query Console will get the expected query result.
This query can also return the expected query result in Apache Jena and RDF4j.
Can someone give me an answer or a hint about it?
As Mads points points out: the numbers are the same.
However, this is an interesting one.
MarkLogic does keep and understand the scientific notation. The REST API also handles this for many return types. I tested your query against the /v1/sparql endpoint:
Accept Header: application/sparql-results+xml
...
<result>
<binding name="s">
<uri>http://John</uri>
</binding>
<binding name="p">
<uri>http://have</uri>
</binding>
<binding name="o">
<literal datatype="http://www.w3.org/2001/XMLSchema#double">2.1123519E7</literal>
</binding>
</result>
...
** Accept Header: text/csv **
s,p,o
http://John,http://have,2.1123519E7
Same for HTML ETC..
However, for JSON, things are different:
{
"s": {
"type": "uri",
"value": "http://John"
},
"p": {
"type": "uri",
"value": "http://have"
},
"o": {
"datatype": "http://www.w3.org/2001/XMLSchema#double",
"type": "literal",
"value": "21123519"
}
}
This matches the fact that the scientific notation appears to be lost as double for JSON:
JSON.parse('{ "foo" : 2.1123519E7}')
//return:
{
"foo": 21123519
}
So, it all comes down to how you are requesting your results in your call to MarkLogic. Some response types return what you expect. At least one (JSON) does not. At this point, I suggest opening a ticket under the Java API project: https://github.com/marklogic/java-client-api/issues

SPARQL: Conditional INSERT

I'm trying to create a SPARQL statement that inserts some triples only if a certain pattern isn't yet in the graph.
PREFIX ssb: <ssb:ontology:>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
INSERT {
<ssb:message/some=> rdf:type ssb:Message;
ssb:seq 7;
ssb:raw "some text";
ssb:author 1.
} WHERE {
FILTER NOT EXISTS {
[] ssb:seq 7; ssb:author 1
}
}
Unfortunately, this seems to create the new triples even if a resource with that ssb:seq and ssb:author already exist, tried the with quadstorejs and with oxigraph.
Any suggestion on how to perform such a conditional insert? The goal is that I don't end up with several resources having the same sequence number and author.
I believe your first attempt is correct and it looks like a bug in the systems that you tried.
The algebra for
FILTER NOT EXISTS {
[] ssb:seq 7; ssb:author 1
}
is the FILTER directly on top of Singleton and it must return a single (empty) solution when the [] ssb:seq 7; ssb:author 1 does not match data. Since there're no variables in your INSERT template, data should be inserted.
The version with OPTIONAL isn't much different, there's an implicit {} before the OPTIONAL, and it's the same Singleton.
I just tried a CONSTRUCT version of your 1st query with Stardog and it worked as expected.
I found a solution that seems to work:
PREFIX ssb: <ssb:ontology:>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
INSERT {
<ssb:message/some=> rdf:type ssb:Message;
ssb:seq 7;
ssb:raw "some text";
ssb:author 1.
} WHERE {
OPTIONAL {?x ssb:seq 7; ssb:author 1.}
FILTER (!BOUND(?x))
}
Not sure exactly why, though. I mean the WHERE-Clause either matches nothing because the pattern isn't there or because it is filtered out when it exists.

Exact match of variable string in SPARQL Wikidata Query Service

Exact match of variable string in SPARQL Wikidata Query Service at https://query.wikidata.org does not give the the results I expected.
I was expecting I could do:
SELECT * {
hint:Query hint:optimizer "None" .
{ SELECT DISTINCT (xsd:string(?author_name_) AS ?author_name) { wd:Q5565155 skos:altLabel ?author_name_ . } }
?work wdt:P2093 ?author_name .
}
But I get no returned results from the Wikidata Query Service:
However, if I use the "=" comparison, I can match the strings:
SELECT * {
hint:Query hint:optimizer "None" .
{ SELECT DISTINCT (xsd:string(?author_name_) AS ?author_name) { wd:Q5565155 skos:altLabel ?author_name_ . } }
?work wdt:P50 wd:Q5565155 .
?work wdt:P2093 ?author_name__ .
FILTER (?author_name = ?author_name__)
}
With the current data in Wikidata, I get five rows returned in this query.
Another way to get this data is by using a BIND:
SELECT * {
BIND("Knudsen GM" AS ?author_name)
?work wdt:P2093 ?author_name .
}
I suppose there might be something wrong with the casting as this does not return anything:
SELECT * {
BIND(xsd:string("Knudsen GM") AS ?author_name)
?work wdt:P2093 ?author_name .
}
Combinations with xsd:string changed to STR or no conversion at all in the original query do neither yield result rows.

how to programmatically get all available information from a Wikidata entity?

I'm really new to wikidata. I just figured that wikidata uses a lot of reification.
Suppose we want to get all information available for Obama. If we are going to do it from DBpedia, we would just use a simple query:
select * where {<http://dbpedia.org/resource/Barack_Obama> ?p ?o .} This would return all the properties and values with Obama being the subject. Essentially the result is the same as this page: http://dbpedia.org/page/Barack_Obama while the query result is in a format I needed.
I'm wondering how to do the same thing with Wikidata. This is the Wikidata page for Obama: https://www.wikidata.org/wiki/Q76. Let's say I want all the statements on this page. But almost all the statements on this page are reified in that they have ranks and qualifiers, etc. For example, for the "educated at" part, it not only has the school, but also the "start time" and "end time" and all schools are ranked as normal since Obama is not in these schools anymore.
I could just get all the schools by getting the truthy statements (using https://query.wikidata.org):
SELECT ?school ?schoolLabel WHERE {
wd:Q76 wdt:P69 ?school .
SERVICE wikibase:label {
bd:serviceParam wikibase:language "en" .
}
}
The above query will simple return all the schools.
If I want to get the start time and end time of the school, I need to do this:
SELECT ?school ?schoolLabel ?start ?end WHERE {
wd:Q76 p:P69 ?school_statement .
?school_statement ps:P69 ?school .
?school_statement pq:P580 ?start .
?school_statement pq:P582 ?end .
SERVICE wikibase:label {
bd:serviceParam wikibase:language "en" .
}
}
But the thing is, without looking at the actual page, how would I know that the ?school_statement has pq:P580 and pq:P582, namely the "start time" and "end time"? And it all comes down to a question that how do I get all the information (including reification) from https://www.wikidata.org/wiki/Q76?
Ultimately, I would expect a table like this:
||predicate||object||objectLabel||qualifier1||qualifier1Value||qualifier2||qualifier2Value||...
you should probably go for the Wikidata data API (more specifically the wbgetentities module) instead of the SPARQL endpoint:
In your case:
https://www.wikidata.org/w/api.php?action=wbgetentities&format=json&ids=Q76
You should find all the qualifiers data you where looking for: example with entities.Q76.claims.P69.1
{ mainsnak:
{ snaktype: 'value',
property: 'P69',
datavalue:
{ value: { 'entity-type': 'item', 'numeric-id': 3273124, id: 'Q3273124' },
type: 'wikibase-entityid' },
datatype: 'wikibase-item' },
type: 'statement',
qualifiers:
{ P580:
[ { snaktype: 'value',
property: 'P580',
hash: 'a1db249baf916bb22da7fa5666d426954435256c',
datavalue:
{ value:
{ time: '+1971-01-01T00:00:00Z',
timezone: 0,
before: 0,
after: 0,
precision: 9,
calendarmodel: 'http://www.wikidata.org/entity/Q1985727' },
type: 'time' },
datatype: 'time' } ],
P582:
[ { snaktype: 'value',
property: 'P582',
hash: 'a065bff95f5cb3026ebad306b3df7587c8daa2e9',
datavalue:
{ value:
{ time: '+1979-01-01T00:00:00Z',
timezone: 0,
before: 0,
after: 0,
precision: 9,
calendarmodel: 'http://www.wikidata.org/entity/Q1985727' },
type: 'time' },
datatype: 'time' } ] },
'qualifiers-order': [ 'P580', 'P582' ],
id: 'q76$464382F6-E090-409E-B7B9-CB913F1C2166',
rank: 'normal' }
Then you might be interesting in ways to extract readable results from those results

elasticsearch splits by space in facets

I am trying to do a simple facet request over a field containing more than one word (Eg: 'Name1 Name2', sometimes with dots and commas inside) but what I get is...
"terms" : [{
"term" : "Name1",
"count" : 15
},
{
"term" : "Name2",
"count" : 15
}]
so my field value is split by spaces and then runs the facet request...
Query example:
curl -XGET http://my_server:9200/idx_occurrence/Occurrence/_search?pretty=true -d '{
"query": {
"query_string": {
"fields": [
"dataset"
],
"query": "2",
"default_operator": "AND"
}
},
"facets": {
"test": {
"terms": {
"field": [
"speciesName"
],
"size": 50000
}
}
}
}'
Your field shouldn't be analyzed, or at least not tokenized. You need to update your mapping and then reindex if you want to index the field without tokenizing it.
First of all, javanna provided a very good answer from a practical perspective. However, for the sake of completeness, I want to mention that in some cases there is a way to do it without reindexing the data.
If the speciesName field is stored and your queries produce relatively small number of results, you can use script_field to retrieve stored field values:
curl -XGET http://my_server:9200/idx_occurrence/Occurrence/_search?pretty=true -d '{
"query": {
"query_string": {
"fields": ["dataset"],
"query": "2",
"default_operator": "AND"
}
},
"facets": {
"test": {
"terms": {
"script_field": "_fields['\''speciesName'\''].value",
"size": 50000
}
}
}
}
'
As a result of this query, elasticsearch will retrieve the speciesName field for every record in your result set and it will construct facets from these values. Needless to say, if your result set contains millions of records, performance of this query might be sluggish.
Similarly, if the field is not stored, but record source is stored, you can use script_field facet to retrieve field values from the source:
......
"script_field": "_source['\''speciesName'\'']",
......
Again, source for each record in the result list will be retrieved and parsed, so you might need some patience to run this query on a large set of records.