SPARQL: How to retrieve a 1:N relationship? - sparql

After learning the basics of SPARQL I'm still trying to make sense of 1:N relationships. How can I retrieve an object with all its relationships as a single record?
For example, I have a Project linked to two Topics. And I try to retrieve it with:
SELECT ?projName ?topic ?topicName {
?proj hasName ?projName .
?proj hasTopic ?topic .
?topic hasName ?topicName .
FILTER (?proj = <$uri>) .
}
But what I get is:
result: [
[
projName: "My Project"
topic: "TOPIC1_URI"
topicName: "First Topic"
],
[
projName: "My Project"
topic: "TOPIC2_URI"
topicName: "Second Topic"
]
]
But I would want to get it as:
result: [
projName: "My Project"
topics: [
[
topic: "TOPIC1_URI"
topicName: "First Topic"
],
[
topic: "TOPIC2_URI"
topicName: "Second Topic"
]
]
]
How could I achieve this? I don't know what I'm missing but I don't see how to do this with SPARQL.
Thanks a lot in advance

As you can read here, the result of a SELECT SPARQL query is a set of bindings, i.e., assignments of values to the free variables of the query. You can think of such bindings as a matrix or a table, like for SQL queries, whose attributes are the variables' names. But you can't arrange such variables bindings as you wish if you have just a matrix for representing them.
What you can do is to collect all data about topics in a single variable using the GROUP_CONCAT function. For example:
SELECT
?projName
(GROUP_CONCAT(?topicData; separator=", ") AS ?topics)
WHERE {
?proj hasName ?projName .
?proj hasTopic ?topic .
?topic hasName ?topicName .
FILTER (?proj = <$uri>) .
BIND(CONCAT(?topic, ": ", ?topicName) AS ?topicData)
}
GROUP BY ?projName
But remember that the values assigned to ?topics will be strings, not JSON arrays:
result: [
projName: "My Project"
topics: "TOPIC1_URI: First Topic, TOPIC2_URI: Second Topic"
]
Clearly you can choose other separators than ": " and ", ".

Related

Wikidata Query Service/Categories: number of pages/subcategories and HiddenCategory attributes

using gas:service or mediawiki:categoryTree services of Wikidata API is it possible somehow to include mediawiki:pages, mediawiki:subcategories and mediawiki:HiddenCategory attributes in query results? I see these attributes in dumps, but have no luck trying to access them programmatically (with SPARQL or some other API)...
You just need to add your conditions e.g for pages add:
?out mediawiki:pages ?pages .
Result
{
"out" : {
"type" : "uri",
"value" : "https://en.wikipedia.org/wiki/Category:Fictional_ducks"
},
"depth" : {
"datatype" : "http://www.w3.org/2001/XMLSchema#int",
"type" : "literal",
"value" : "1"
},
"pages" : {
"datatype" : "http://www.w3.org/2001/XMLSchema#integer",
"type" : "literal",
"value" : "113"
}
They warn that you can't access this through the UI, so you need to encode your query and pass it in the URL:https://query.wikidata.org/bigdata/namespace/categories/sparql?query=&format=json
Full query:
PREFIX gas: <http://www.bigdata.com/rdf/gas#>
prefix mediawiki: <https://www.mediawiki.org/ontology#>
SELECT * WHERE {
SERVICE gas:service {
gas:program gas:gasClass "com.bigdata.rdf.graph.analytics.BFS" .
gas:program gas:linkType mediawiki:isInCategory .
gas:program gas:traversalDirection "Reverse" .
gas:program gas:in <https://en.wikipedia.org/wiki/Category:Ducks>. # one or more times, specifies the initial frontier.
gas:program gas:out ?out . # exactly once - will be bound to the visited vertices.
gas:program gas:out1 ?depth . # exactly once - will be bound to the depth of the visited vertices.
gas:program gas:maxIterations 8 . # optional limit on breadth first expansion.
}
?out mediawiki:pages ?pages .
} ORDER BY ASC(?depth)

JSON response for a nested SPARQL

I have the structure like below I am currently looking to get a JSON response like this with a SPARQL query I have tried few things like concat and str but dint work out that well for me it got complicated at 3rd level
I have now added 2 frames that I have tried with JSON-LD Framing it gives the correct output until the first level at second level it fails to expand the data
:Reference xx1:timestamp "12/15/2020" .
:Reference xx2:logs xxx:log1 .
:log1 rdf:type xxx:Logs .
:log1 xx1:approver "xxx:bob" .
:log1 xx1:requester "xxx:daisy" .
:log1 xx1:timestamp "12/15/2020" .
:log1 xx1:name "log1" .
:log2 rdf:type xxx:Logs .
:log2 xx1:approver "xxx:bob" .
:log2 xx1:requester "xxx.daisy" .
:log2 xx1:timestamp "18/15/2020" .
:log2 xx1:name "log2" .
:bob rdf:type xxx:User .
:bob xx1:name "bob" .
:daisy rdf:type xxx:User .
:daisy xx1:name "daisy" .```
Required Response with SPARQL (3 Levels)
[
{
"timestamp": "12/15/2020",
"logs": [
{ "name": "log1", "timestamp": "12/15/2020" "approver" : {name: bob },"requester" : {name: bob }},
{ "name": "log2", "timestamp": "12/15/2020" "approver" : {name: bob },"requester" : {name: bob }},
]
}
]
JSON-LD FRAMING
FRAME 1
{
"#context":{
"XXX":"http://ABC"
},
"#type":"xxx:Reference",
"contains":{
"#type":"xxx:Log",
"contains":{
"#type":"xxx:User"
}
}
}
FRAME 2
{
"#context":{
"XXX":"http://ABC"
},
"#type":"xxx:Reference",
"contains":{
"#type":"xxx:Log",
"hasApprover" :{"#type":"xxx:User"},
"hasRequester" :{"#type":"xxx:User"}
}
}
The output that I get is
Reference [Log 1 {User is expnded}, Log2{User is not expanded}]
What I need is
Reference [Log 1 {User is expanded}, Log2{User is expanded}]
This JSON-LD frame helps to get the required result
"#context":{
"XXX":"http://ABC"
},
"#type":"xxx:Reference",
"contains":{
"#type":"xxx:Log",
"hasApprover" :{"#type":"xxx:User","#embed": "#always"},
"hasRequester" :{"#type":"xxx:User","#embed": "#always"}
}
}

SPARQL filter xsd:Date fails

I have the following SPARQL query
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT DISTINCT ?d
WHERE { GRAPH <https:/my/triples>{
?s <http://my/ontology/v1.0#hasTimestamp> ?d .
FILTER(?d > "1945-01-01"^^xsd:date)
}
}
When i execute the above, it fails to filter correctly the results by date. Actually it gives me no results at all, and no errors.
If i remove the date filter i get this response:
"bindings": [
{
"s": { "type": "uri" , "value": "seo:S2A_MSI_2019_11_28_09_33_31_T33SXB_t_dogliotti" } ,
"d": { "datatype": "xsd:date" , "type": "typed-literal" , "value": "2019-11-28" }
}
What could be wrong?
I tried all of the suggestions in the comments but i couldn't make it work properly.
I stumbled upon a post from 2011 (!!) that partially gave me the solution.
The post is here and it states that the below solution (aka. "stringifying" the date works only for equality
FILTER(str(?d) >= "2018") .
and casting to xsd:dateTime works for comparisons.
In my case the first solution worked for comparisons (including equality), while casting fails miserably.

Best approach to create URIs with Sparql (like auto increment)

I am currently writing a service which creates new items (data) by user input. To save these items in a RDF Graph Store (currently using Sesame via Sparql 1.1), I need to add a subject URI to the data. My approach is to use a number that is incremented for each new item. So e.g.:
<http://example.org/item/15> dct:title "Example Title" .
<http://example.org/item/16> dct:title "Other Item" .
What's the best approach to get an incremented number for new items (like auto incement in MySQL/MongoDB) via Sparql? Or to issue some data and the endpoint autmatically creates a URI by a template (like done for blank nodes).
But I don't want to use blank nodes as subjects for these items.
Is there a better solution than using an incremented number? My users don't care about the the URI.... and I don't want to handle collisions like created by hashing the data and using the hash as part of the subject.
If you maintain a designated counter during updates, then something along these lines will do it,
fisrt insert a counter into your dataset
insert data {
graph <urn:counters> {<urn:Example> <urn:count> 1 }
}
then a typical update should looks like:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
delete {
#remove the old value of the counter
graph <urn:counters> {<urn:Example> <urn:count> ?old}
}
insert {
#update the new value of the counter
graph <urn:counters> {<urn:Example> <urn:count> ?new}
# add your new data using the constructed IRI
GRAPH <http://example.com> {
?id dct:title "Example Title" ;
a <http://example.org/ontology/Example> .
} .
} where {
# retrieve the counter
graph <urn:counters> {<urn:Example> <urn:count> ?old}
# compute the new value
bind(?old+1 as ?new)
#construct the IRI
bind(IRI(concat("http://example.org/item/", str(?old))) as ?id)
}
Assuming the class of your items is http://example.org/ontology/Example, the query becomes the following. Note: items must be inserted one by one, as only one new URI is computed at each transaction.
PREFIX dct: <http://purl.org/dc/terms/>
INSERT {
GRAPH <http://example.com> {
?id dct:title "Example Title" ;
a <http://example.org/ontology/Example> .
} .
} WHERE {
SELECT ?id WHERE {
{
SELECT (count(*) AS ?c) WHERE {
GRAPH <http://example.com> { ?s a <http://example.org/ontology/Example> }
}
}
BIND(IRI(CONCAT("http://example.org/item/", STR(?c))) AS ?id)
}
}
(Tested with GraphDB 8.4.0 using RDF4J 2.2.2)
You said you are open to other options than an auto-incremented number. One good alternative is to use UUIDs.
If you don't care at all what the URI looks like, you can use the UUID function:
INSERT {
?uri dct:title "Example Title"
}
WHERE {
BIND (UUID() AS ?uri)
}
This will generate URIs like <urn:uuid:b9302fb5-642e-4d3b-af19-29a8f6d894c9>.
If you'd rather have HTTP URIs in your own namespace, you can use strUUID:
INSERT {
?uri dct:title "Example Title"
}
WHERE {
BIND (IRI(CONCAT("http://example.org/item/", strUUID())) AS ?uri)
}
This will generate URIs like http://example.org/item/73cd4307-8a99-4691-a608-b5bda64fb6c1.
UUIDs are pretty good. Collision risk is negligible. The functions are part of the SPARQL standard. The only downside really is that they are long and ugly.

how to programmatically get all available information from a Wikidata entity?

I'm really new to wikidata. I just figured that wikidata uses a lot of reification.
Suppose we want to get all information available for Obama. If we are going to do it from DBpedia, we would just use a simple query:
select * where {<http://dbpedia.org/resource/Barack_Obama> ?p ?o .} This would return all the properties and values with Obama being the subject. Essentially the result is the same as this page: http://dbpedia.org/page/Barack_Obama while the query result is in a format I needed.
I'm wondering how to do the same thing with Wikidata. This is the Wikidata page for Obama: https://www.wikidata.org/wiki/Q76. Let's say I want all the statements on this page. But almost all the statements on this page are reified in that they have ranks and qualifiers, etc. For example, for the "educated at" part, it not only has the school, but also the "start time" and "end time" and all schools are ranked as normal since Obama is not in these schools anymore.
I could just get all the schools by getting the truthy statements (using https://query.wikidata.org):
SELECT ?school ?schoolLabel WHERE {
wd:Q76 wdt:P69 ?school .
SERVICE wikibase:label {
bd:serviceParam wikibase:language "en" .
}
}
The above query will simple return all the schools.
If I want to get the start time and end time of the school, I need to do this:
SELECT ?school ?schoolLabel ?start ?end WHERE {
wd:Q76 p:P69 ?school_statement .
?school_statement ps:P69 ?school .
?school_statement pq:P580 ?start .
?school_statement pq:P582 ?end .
SERVICE wikibase:label {
bd:serviceParam wikibase:language "en" .
}
}
But the thing is, without looking at the actual page, how would I know that the ?school_statement has pq:P580 and pq:P582, namely the "start time" and "end time"? And it all comes down to a question that how do I get all the information (including reification) from https://www.wikidata.org/wiki/Q76?
Ultimately, I would expect a table like this:
||predicate||object||objectLabel||qualifier1||qualifier1Value||qualifier2||qualifier2Value||...
you should probably go for the Wikidata data API (more specifically the wbgetentities module) instead of the SPARQL endpoint:
In your case:
https://www.wikidata.org/w/api.php?action=wbgetentities&format=json&ids=Q76
You should find all the qualifiers data you where looking for: example with entities.Q76.claims.P69.1
{ mainsnak:
{ snaktype: 'value',
property: 'P69',
datavalue:
{ value: { 'entity-type': 'item', 'numeric-id': 3273124, id: 'Q3273124' },
type: 'wikibase-entityid' },
datatype: 'wikibase-item' },
type: 'statement',
qualifiers:
{ P580:
[ { snaktype: 'value',
property: 'P580',
hash: 'a1db249baf916bb22da7fa5666d426954435256c',
datavalue:
{ value:
{ time: '+1971-01-01T00:00:00Z',
timezone: 0,
before: 0,
after: 0,
precision: 9,
calendarmodel: 'http://www.wikidata.org/entity/Q1985727' },
type: 'time' },
datatype: 'time' } ],
P582:
[ { snaktype: 'value',
property: 'P582',
hash: 'a065bff95f5cb3026ebad306b3df7587c8daa2e9',
datavalue:
{ value:
{ time: '+1979-01-01T00:00:00Z',
timezone: 0,
before: 0,
after: 0,
precision: 9,
calendarmodel: 'http://www.wikidata.org/entity/Q1985727' },
type: 'time' },
datatype: 'time' } ] },
'qualifiers-order': [ 'P580', 'P582' ],
id: 'q76$464382F6-E090-409E-B7B9-CB913F1C2166',
rank: 'normal' }
Then you might be interesting in ways to extract readable results from those results