Find entities with missing attributes in Datomic - missing-data

If I have the following Datomic database:
{ :fred :age 42 }
{ :fred :likes :pizza }
{ :sally :age 42 }
How do I query for both entities (:fred and :sally), getting back the attribute :likes :pizza for :fred and an empty value for :sally?
The query
[:find ?n ?a ?l
:where [?n :age ?a]
[?n :likes ?l]]
only returns :fred 42 :pizza.

Datomic has recently been updated with a few expression functions available to you in Datomic queries. One of these functions is called get-else and it lets you provide a default return value if an attribute doesn't exist on an entity, much like how clojure.core/get will return an option third param if the key isn't found.
So using your own example, you would only need to change it like so:
[:find ?n ?a ?l
:where [?n :age ?a]
[(get-else $ ?n :likes false) ?l]
Unfortunately you can't actually make nil a "default" value since it's not a valid Datomic data type, and Datomic will carp if you try, but false should get you where you're going as well.

You can also try the "missing?" fn.
Have a look on this:
http://docs.datomic.com/query.html#missing

Getting back a set of entities which may or may not have particular attributes asserted is analogous to a LEFT JOIN in a relational database.
The approach in datomic is to do two steps: first query for the entities and then navigate from there to get the attribute values or nil if the attribute is not asserted for a given entity.
See the mailing list post How do you do a left join in Datomic? with its accompanying gist for an example.

Related

Get every places names in geonames with SPARQL DBPEDIA

I'm quite new to sparql.
I founded this query to get all country in the UN
select distinct ?s
where { ?s a <http://dbpedia.org/class/yago/WikicatMemberStatesOfTheUnitedNations> }
So I tried to adapt it to Geonames with:
select distinct ?s
where { ?s a <http://dbpedia.org/page/GeoNames> }
But it doesn't work. How can I get every place's name in geonames?
I hope someone can help me with that!
Every publisher uses its own namespace and method to generate URIs of the published entities. The nice thing about Linked Open Data is that it allows such independence while URIs can still be linked using agreed open standards. When different URI represent the same thing, this is declared by linking them with owl:sameAs.
Your query attempt assumes that DPpedia and Geonames use the same URIs, if I understood correctly the intention (I'm not sure qhat you mean by "to adapt"). What you need to do is use two separate variables, and then specify that from the owl:sameAs mappings, you want only those from Geonames.
select distinct *
where { ?cuntryDBpedia a <http://dbpedia.org/class/yago/WikicatMemberStatesOfTheUnitedNations> ;
owl:sameAs ?countryGeonames .
FILTER REGEX (?countryGeonames,"geonames.org")
}

SPARQL-Query for all objects with a specific prefix?

I created an ontology with different prefixes (rdf, rdfs, owl, example, car, bike, ...). I use them to demarcate different domains and examples.
How can I query for all objects with the profix i.e. "car"?
Thank you in advance!
For the future, providing a minimal sample of the data will help in providing a working query on the data. With no further detailsand assuming that you mean by "objects" the objects of triples (and indeed untested) :
PREFIX car: <TODO_ADD_URI_OF_NAMESPACE_HERE>
SELECT * {
?s ?p ?o .
FILTER(isUri(?o) && STRSTARTS(STR(?o), STR(car:)))
}

Is there a way to get the 'is skos:broader of' of an entity from DBPedia using SPARQL?

Basically, I'm trying to get the 'subclasses' of this entity. For example:
I tried using --
select ?p1 where {
<http://dbpedia.org/resource/Category:Norwegian_silent_film_actors> skos:narrower ?p1 .
}
-- and --
select ?p1 where {
<http://dbpedia.org/resource/Category:Norwegian_silent_film_actors> rdfs:subclass ?p1 .
}
-- but since that's not actually its predicate, it doesn't work. Both actually return just the entity itself if a * is added after the predicate.
Is there any way of getting those objects?
It's important to remember that is skos:broader of relations are inverse skos:broader relations -- which imply but do not necessarily indicate the presence of skos:narrower statements. DBpedia doesn't have every explicit statement that might be inferred from what's there, and inference rules are not active by default.
You can use the explicit statements that do exist with queries like this, which uses the property path + for one-or-more skos:broader relationships --
select ?p1
where
{
?p1
skos:broader+
<http://dbpedia.org/resource/Category:Norwegian_silent_film_actors>
}
-- or this, which uses the property path ^ to reverse the relationship --
select ?p1
where
{
<http://dbpedia.org/resource/Category:Norwegian_silent_film_actors>
^skos:broader*
?p1
}
This is a place where inference rules might well be brought to bear. Unfortunately, there are no predefined inference rules relating skos:broader and skos:narrower, and this public endpoint does not accept ad-hoc rule additions. You could create some on a personal endpoint, whether pre-built and pre-populated with DBpedia in the cloud or otherwise.

SPARQL RDFterm-equal FILTER

I am trying to validate concepts such that their respective categories are related by a parent/child relation (skos:broader). Getting the resources and their categories is trivial but then something is going on when I try to filter them with the relation:
select distinct *
where
{
<http://dbpedia.org/resource/Model-checking> dbo:wikiPageRedirect* ?conceptChild .
?conceptChild dbo:wikiPageRedirects* ?redirectedChild .
?redirectedChild dct:subject ?subjectChild .
?subjectChild skos:broader ?broaderThanSubjectChild .
<http://dbpedia.org/resource/Formal_methods> dbo:wikiPageRedirect* ?conceptParent .
?conceptParent dbo:wikiPageRedirects* ?redirectedParent .
?redirectedParent dct:subject ?subjectParent .
FILTER ( ?subjectParent = ?broaderThanSubjectChild )
}
This query has no results (via the Virtuoso SPARQL Query Editor on the public DBpedia endpoint) whereas the same query without the filter produces the expected results.
Any thoughts on this?
This does look like some kind of bug. Interestingly, if you replace the filter with a bind, e.g.,
bind((?subjectParent = ?broaderThanSubjectChild) as ?TEST)
you'll get 1 in one row, and 0 in the rest, so the comparison seems to be happening, but the filter is breaking for some reason.

How to get a concise bounded description of a resource with Sesame?

I've been testing Sesame 2.7.2 and I got a big surprise when faced to the fact that DESCRIBE queries do not include blank nodes closure [EDIT: the right term for this is CBD for concise bounded description]
If I correctly understand, the SPARQL spec is quite loose on that and says that what is returned is actually up to the provider, but I'm still surprised at the choice, since bnodes (in the results of the describe query) cannot be used in subsequent SPARQL queries.
So the question is: how can I get a closed description of a resource <uri1> without doing:
query DESCRIBE <uri1>
iterate over the result to determine which objects are blank nodes
then DESCRIBE ?b WHERE { <uri1> pred_relating_to_bnode_ ?b }
do it recursively and chaining over as long as bnodes are found
If I'm not mistaken, depth-2 bnodes would have to be described with
DESCRIBE ?b2 WHERE {<uri1> <p1&> ?b . ?b <p2> ?b2 }
unless there is a simpler way to do this?
Finally, would it not be better and simpler to let DESCRIBE return a closed description of a resource where you can still obtain the currently returned result with something like the following?
CONSTRUCT {<uri1> ?p ?o} WHERE {<uri1> ?p ?o}
EDIT: here is an example of a closed result I want to get back from Sesame
<urn:sites#1> a my:WebSite .
<urn:sites#1> my:domainName _:autos1 .
<urn:sites#1> my:online "true"^^xsd:boolean .
_:autos1 a rdf:Alt .
_:autos1 rdf:_1 _:autos2
_:autos2 my:url "192.168.2.111:15001"#fr
_:autos2 my:url "192.168.2.111:15002"#en
Currently: DESCRIBE <urn:sites#1> returns me the same result as the query CONSTRUCT WHERE {<urn:sites#1> ?p ?o}, so I get only that
<urn:sites#1> a my:WebSite .
<urn:sites#1> my:domainName _:autos1 .
<urn:sites#1> my:online "true"^^xsd:boolean .
Partial solutions using SPARQL
Based on your comments, this isn't an exact solution yet, but note that you can describe multiple things in a given describe query. For instance, given the data:
#prefix : <http://example.org/> .
:Alice :named "Alice" ;
:likes :Bill, [ :named "Carl" ;
:likes [ :named "Daphne" ]].
:Bill :likes :Elaine ;
:named "Bill" .
you can run the query:
PREFIX : <http://example.org/>
describe :Alice ?object where {
:Alice :likes* ?object .
FILTER( isBlank( ?object ) )
}
and get the results:
#prefix : <http://example.org/> .
:Alice
:likes :Bill ;
:likes [ :likes [ :named "Daphne"
] ;
:named "Carl"
] ;
:named "Alice" .
That's not a complete description of course, because it's only following :likes out from :Alice, not arbitrary predicates. But it does get the blank nodes named "Carl" and "Daphne", which is a start.
The larger issue in Sesame
It looks like you're going to have to do something like what's described above, and possibly with multiple searches, or you're going to have to modify Sesame. The alternative to writing some creative SPARQL is to change the way that Sesame implements describe queries. Some endpoints make this relatively easy, but Sesame doesn't seem to be one of them. There's a mailing list thread from 2011, Custom SPARQL DESCRIBE Implementation, that seems addressed at this same problem.
Roberto GarcĂ­a asks:
I'm trying to customise the behaviour of SPARQL DESCRIBE queries.
I'm willing to get something similar to CBD (i.e. all properties and
values for the described resource plus all properties and values for
the blank nodes connected to it).
I have tried to reproduce a similar behaviour using a CONSTRUCT query
but the performance is not good and the query gets quite complex if I
try to consider long chains of properties pointing to blank nodes
starting from the described resource.
Jeen Broekstra replies:
The implementation of DESCRIBE in Sesame is hardcoded in the query
parser. It can only be changed by adapting the parser itself, and even
then it will be tricky, as the query model has no easy way to express it
either: it needs an extension of the algebra.
> If this is not possible, any advice about how to implement it using CONSTRUCT
queries?
I'm not sure it's technically possible to do this in a single query.
CBDs are recursive in nature, and while SPARQL does have some support
for recursivity (property chains), the problem is that you have to do an
intermediate check in every step of the property chain to see if the
bound value is a blank node or not. This is not something that SPARQL
supports out of the box: property chains are defined to have only length
of the path as the stop condition.
Perhaps something is possible using a convoluted combination of
subqueries, unions and optionals, but I doubt it.
I think the best workaround is instead to use the standard DESCRIBE
format that Sesame supports, and for each blank node value in that
result do a separate consecutive query. In other words: you solve it by
hand.
The only other option is to log a feature request for support of CBDs in
Sesame. I can't give any guarantees about if/when that will be followed
up on though.