Define rules for AllegroGraph triples and how to apply them - semantic-web

I'm using AllegroGraph to store statement like this:
<newsid1 hasAnnotation Gamma>
<newsid1 hasAnnotation Beta>
I would like to define a rule on this staments that says: if the subject newsid1 hasAnnotation either Gamma or Beta, then add a new statement in the triplestore that says that the subject hasAnnotation Theta, i.e. the statement
<newsid1 hasAnnotation Theta>
My questions are the following:
How I can define such a rule for Allegro?
How can I apply these rules over the statements?

1) You can define use Prolog functors to define these rules. In your case you will define.
;; Functors to add triples.
(<-- (a-- ?s ?p ?o)
;; Fails unless all parts ground.
(lispp (not (get-triple :s ?s :p ?p :o ?o)))
(lisp (add-triple ?s ?p ?o)))
;; Functors to seek news that should have theta annotation
(<-- (shouldHaveAnnotationTheta ?news)
(q- ?news !namespace:hasAnnotation !"Gamma"))
(<- (shouldHaveAnnotationTheta ?news)
(q- ?news !namespace:hasAnnotation !"Beta"))
2) Run then the following prolog query (using the AGview for exemple) to add these news statements
(select (?news)
(shouldHaveAnnotationTheta ?news)
(a-- ?news !namespace:hasAnnotation !"Theta")
(fail))
You can read the following documents to understand this code :
Prolog functors
Lisp Reference

Related

Functions in SPARQL to Manipulate IRIs?

I want to write some reusable SPARQL queries to do things like take an IRI, get the name part (typically after the # sign), modify it (e.g., replace underscores with blank spaces) and put it in the rdfs:label property. This would be useful for Protege which doesn't fill in the rdfs:label if you use user defined IRIs. Or take an IRI with a user defined name, do the same and then replace the user defined IRI with a UUID. I looked in the SPARQL spec for functions to manipulate IRIs and either they don't exist or I'm missing something obvious. I'm posting this to make sure it isn't the latter. I know it is easy to do the equivalent with things like SUBSTR but I'm surprised that there aren't predefined operators to do things like getting the name part of an IRI and getting the base and want to double check before I roll my own.
In case anyone else wants to do this, I figured it out. There are some answers on this site but they are all for SQL or other languages than SPARQL. The following is for classes and it should be obvious how to adapt it for other entities. Note: this works in the Snap SPARQL Plugin for Protege (that's why I use CONSTRUCT rather than INSERT), however, there is a bug in their implementation of SUBSTR so that it uses 0 based indexing rather than 1 based as the spec says. So if you use this in Snap SPARQL change the 1 to a 2.
CONSTRUCT {?c rdfs:label ?lblname.}
WHERE {?c rdfs:subClassOf owl:Thing.
BIND(STRAFTER(STR(?c), '#') as ?name)
BIND(REPLACE(?name,"([A-Z])", " $1" ) as ?namewbs)
BIND (IF (STRSTARTS(?namewbs," "),SUBSTR(?namewbs,1),?namewbs) AS ?lblname)
FILTER(?c != owl:Thing || ?c != owl:Nothing)}

Select all literals from a specific language in a SPARQL query

In a SPARQL query, it is possible to choose a language for each literal (and also strip the language tag):
CONSTRUCT {?x dc:title ?stripped_title.}
WHERE{
?x rdfs:label ?title .
FILTER (langMatches(lang(?title),"en"))
BIND (STR(?title) AS ?stripped_title)
}
But, if the query has many different literal properties that can be very verbose. Is there a way to set a default language for all literals in a query? Or, at least, make this query less verbose?

SPARQL: is there any path between two nodes?

Is there a good kind of SPARQL query that let's me answer if two given nodes are connected on a single / multiple SPARQL endpoints?
Let's say i want to check if the two nodes
<http://wiktionary.dbpedia.org/resource/dog>
and
<http://dbpedia.org/resource/Dog>
are connected. If yes, i'd be interested in the path.
By guessing i already knew they were connected via the label, so a query like this returns a path of length 3:
SELECT * WHERE {
<http://wiktionary.dbpedia.org/resource/dog> ?p1 ?n1.
# SERVICE <http://dbpedia.org/sparql> {
<http://dbpedia.org/resource/Dog> ?p2 ?n1 .
# }
}
try yourself
Now what if i don't have an idea yet and want to do this automatically & for arbitrary length and direction?
I'm aware of SPARQL 1.1's property paths, but they only seem to work for known properties (http://www.w3.org/TR/sparql11-query/#propertypaths):
Variables can not be used as part of the path itself, only the ends.
Also i would want to allow for any path, so the predicates on the path may change.
My current (as i find ridiculous) approach is to query for all possible paths of length k up to a limit of n.
Dumping isn't an option for me as it's billions of triples... I want to use SPARQL!
While you can't use variables in property paths, you can use a wildcard by taking advantage of the fact that for any URI, every property either is that property or isn't. E.g., (<>|!<>) matches any property, since every property either is <> or isn't. You can make a wildcard that goes in either direction by alternating that with itself in the other direction: (<>|!<>)|^(<>|!<>). That means that there's a path, with properties going in either direction, between two nodes ?u and ?v when
?u ((<>|!<>)|^(<>|!<>))* ?v
For instance, the following query should return true (indicating that there is a path):
ASK {
<http://wiktionary.dbpedia.org/resource/dog> ((<>|!<>)|^(<>|!<>))* <http://dbpedia.org/resource/Dog>
}
Now, to actually get the links of a path between between two nodes, you can do (letting <wildcard> stand for the nasty looking wildcard):
?start <wildcard>* ?u .
?u ?p ?v .
?v <wildcard>* ?end .
Then ?u, ?p, and ?v give you all the edges on the path. Note that if there are multiple paths, you'll be getting all the edges from all the paths. Since your wildcards go in either direction, you can actually get to anything reachable from the ?start or ?end, so you really should consider restricting the wildcard somehow.
On the endpoint that you linked to, it doesn't, but that appears to be an issue with Virtuoso's implementation of property paths, rather than a problem with the actual query.
Do note that this will be trivially satisfied in many cases if you have any kind of inference happening. E.g., if you're using OWL, then every individual is an instance of owl:Thing, so there'd always be a path of the form:
?u &rightarrow;rdf:type owl:Thing &leftarrow;rdf:type ?v

How do I consistently query dbpedia for programming languages by name?

There seems to be no consistent way to query for programming languages based on name. Examples:
http://dbpedia.org/page/D_(programming_language)
rdfs:label "D (programming language)"#en
dbpprop:name "D programming language"
owl:sameAs freebase:"D (programming language)"
foaf:name "D programming language"
vs.
http://dbpedia.org/page/C++
rdfs:label "C++"#en
dbpprop:name "C++"
owl:samwAs freebase:"C++"
foaf:name "C++"
Since there's no standard convention for whether "programming language", "(programming language)", "programming_language", "(programming_language", or "" is part of a name for a programming language in dbpedia, I have no idea how to consistently search by name.
I'd like to create some sort of SPARQL query that returns http://dbpedia.org/page/D_(programming_language) for "D" and http://dbpedia.org/page/C++ for "C++", but I don't know how do to this.
Unless at least one of the various triples for programming languages uses a consistent naming convention, I'll have to hack it by querying first against name + " (programming_language)", and falling back to name + "(programming language", name + " programming language" when no results are found. But I'd like a much more robust method.
You could of course just match using a basic substring match or a regex, e.g. like this to find a match for "C++":
SELECT DISTINCT ?pl ?label
WHERE {
?pl a dbpedia-owl:ProgrammingLanguage ;
rdfs:label ?label .
FILTER(langMatches(lang(?label), "en"))
FILTER(regex(str(?label), "C\\+\\+"))
}
Of course, the above will be problematic for a programming language name like "D", since you will get back several matches ("D", "Dylan", "MAD", etc.). In those cases, you might want to do some clever postprocessing of the result, for example tokenizing the returned label and seeing if your input string occurs as a standalone word.
Regex matching in SPARQL is notoriously expensive (in terms of evaluation time), but since you combine it with a type constraint to a particular category, the DBPedia endpoint should be able to handle this kind of query just fine.
I'd use
SELECT distinct ?pl ?label
WHERE {
?pl a dbpedia-owl:ProgrammingLanguage ;
rdfs:label ?label.
?label bif:contains "'C++'" .
filter (str (?label) like '%C++%')
filter (lang(?label)="en")
}
?label bif:contains "'C++'" would filter to some degree, more specifically to C++, C, Objective-C etc., because ++ would be treated as a noise and excluded from actual search pattern.
After that you have C and need two pluses, so filter (str (?label) like '%C++%') would check for them faster than regex.
Add filter (lang(?label)="en") or filter (langmatches(lang(?label),"en")) or no lang check at all by taste.

How to recursively expand blank nodes in SPARQL construct query?

There is probably an easy to answer to this, but I can't even figure out how to formulate the Google query to find it.
I'm writing SPARQL construct queries against a dataset that includes blank nodes. So if I do a query like
CONSTRUCT {?x ?y ?z .}
WHERE {?x ?y ?z .}
Then one of my results might be:
nm:John nm:owns _:Node
Which is a problem if all of the
_:Node nm:has nm:Hats
triples don't also get into the query result somehow (because some parsers I'm using like rdflib for Python really don't like dangling bnodes).
Is there a way to write my original CONSTRUCT query to recursively add all triples attached to any bnode results such that no bnodes are left dangling in my new graph?
Recursion isn't possible. The closest I can think of is SPARQL 1.1 property paths (note: that version is out of date) but bnode tests aren't available (afaik).
You could just remove the statements with trailing bnodes:
CONSTRUCT {?x ?y ?z .} WHERE
{
?x ?y ?z .
FILTER (!isBlank(?z))
}
or try your luck fetching the next bit:
CONSTRUCT {?x ?y ?z . ?z ?w ?v } WHERE
{
?x ?y ?z .
OPTIONAL {
?z ?w ?v
FILTER (isBlank(?z) && !isBlank(?v))
}
}
(that last query is pretty punishing, btw)
You may be better off with DESCRIBE, which will often skip bnodes.
As user205512 suggests, performing that grab recursively is not possible, and as they point out, using optional(s) to go arbitrary levels down into your data getting the nodes is not feasible on anything but non-trivial sized databases.
Bnodes themselves are locally scoped, to the result set, or to the file. There's no guarantee that a BNode is you get from parsing or from a result set is the same id that is used in the database (though some database do guarantee this for query results). Furthermore, a query like "select ?s where { ?s ?p _:bnodeid1 }" is the same as "select ? where { ?s ?p ?o }" -- note that bnode is treated as a variable in that case, not as "the thing w/ the id 'bnodeid1'" This quirk of the design makes it difficult to query for bnodes, so if you are in control of the data, I'd suggest not using them. It's not hard to generate names for stuff that would otherwise be bnodes, and named resources v. bnodes will not increase overhead during querying.
That does not help you recurse down and grab data, but for that, I don't recommend doing such general queries; they don't scale well and usually return more than you want or need. I'd suggest you do more directed queries. Your original construct query will pull down the contents of the entire database, that's generally not what you want.
Lastly, while describe can be useful, there's not a standard implementation; the SPARQL spec doesn't define any particular behavior, so what it returns is left to the database vendor, and it can be different. That can make your code less portable if you plan on trying different databases with your application. If you want a specific behavior out of describe, you're best off implementing it yourself. Doing something like the concise bounded description for a resource is an easy piece of code, though you can run into some headaches around Bnodes.
With regard to working with the ruby RDF.rb library, which allows SPARQL queries with significant convenience methods on RDF::Graph objects, the following should expand blank nodes.
rdf_type = RDF::SCHEMA.Person # for example
rdf.query([nil, RDF.type, rdf_type]).each_subject do |subject|
g = RDF::Graph.new
rdf.query([subject, nil, nil]) do |s,p,o|
g << [s,p,o]
g << rdf_expand_blank_nodes(o) if o.node?
end
end
def rdf_expand_blank_nodes(object)
g = RDF::Graph.new
if object.node?
rdf.query([object, nil, nil]) do |s,p,o|
g << [s,p,o]
g << rdf_expand_blank_nodes(o) if o.node?
end
end
g
end