How to parse elements from Sparql algebra - sparql

I can get the algebra form from sparql query string using ARQ algebra (com.hp.hpl.jena.sparql.algebra):
String queryStr =
"PREFIX foaf: <http://xmlns.com/foaf/0.1/>" +
"SELECT DISTINCT ?name ?nick" +
"{?x foaf:mbox <mailt:person#server> ." +
"?x foaf:name ?name" +
"OPTIONAL { ?x foaf:nick ?nick }}";
Query query = QueryFactory.create(queryStr);
Op op = Algebra.compile(query);
Print the returned value of op:
(distinct
(project (?name ?nick)
(join
(bgp
(triple ?x <http://xmlns.com/foaf/0.1/mbox> <mailt:person#server>)
(triple ?x <http://xmlns.com/foaf/0.1/name> ?nameOPTIONAL)
)
(bgp (triple ?x <http://xmlns.com/foaf/0.1/nick> ?nick)))))
Returned value is an Op type, but I can't find any direct methods that can parse the op into elements, e.g., basic graph patterns of s, p, o, and the relations between these graph patterns.
Any hint is appreciated, thanks.

Why serialise out the algebra at all?
If your aim is to walk the algebra tree and extract the BGPs then you can do this using the OpVisitor interface of which there are various implementations of that will get you started. The particular method you would care about is visit(OpBgp opBgp) since then you can access the methods of the OpBgp class to extract the pattern information

It might be too late but still as I am not seeing final answer so I am writing the code for printing all BGP.
For that create the class as follows that extends OpVisitorBase and override the public void visit(final OpBGP) function as given below in the code.
And from your code simply call the function:
MyOpVisitorBase.myOpVisitorWalker(op);
public class MyOpVisitorBase extends OpVisitorBase
{
public static void myOpVisitorWalker(Op op)
{
OpWalker.walk(op, this);
}
#Override
public void visit(final OpBGP opBGP) {
final List<Triple> triples = opBGP.getPattern().getList();
int i = 0;
for (final Triple triple : triples) {
System.out.println("Triple: "+triple.toString());
}
}
}

Related

Access subclassOf/subclassOf level using ForwardChainingRDFSInferencer with SPARQL

I am running some SPARQL queries using the class ForwardChainingRDFSInferencer, which basic constructs an inferencer. For my examples I use the schema.org ontology.
My code looks like the following example
MemoryStore store = new MemoryStore();
ForwardChainingRDFSInferencer inferencer = new ForwardChainingRDFSInferencer(store); //the inference class
Repository repo = new SailRepository(inferencer);
repo.initialize();
URL data = new URL("file:/home/user/Documents/schemaorg-current-https.nt");
RDFFormat fileRDFFormat = RDFFormat.NTRIPLES;
RepositoryConnection con = repo.getConnection();
con.add(data, null, fileRDFFormat);
System.out.println("Repository loaded");
String queryString =
" PREFIX schema: <https://schema.org/>" +
" SELECT DISTINCT ?subclassName_one" +
" WHERE { " +
" ?type rdf:type rdfs:Class ." +
" ?type rdfs:subClassOf/rdfs:subClassOf? schema:Thing ." +
" ?type rdfs:label ?subclassName_one ." +
" }";
TupleQuery tupleQuery = con.prepareTupleQuery(QueryLanguage.SPARQL, queryString);
TupleQueryResult result = tupleQuery.evaluate();
while (result.hasNext()) {
BindingSet bindingSet = result.next();
System.out.println(bindingSet.toString());
stdout.println(bindingSet.toString());
}
repo.close()
What I want is to print two subclass levels down the class Thing. So if for example we have,
Thing > Action (sub-class 1) > ConsumeAction (sub-class 2) > DrinkAction
What I want is to return the class CosumeAction which is two levels (subclasses) down from class Thing, while using the inference java class.
The current query, as given in the code sample above, returns all the classes and subclasses from every class of the schema.org ontology. Thus, there is something that I do wrong while using the inference class.
You could remove the classes you are not interested in with FILTER NOT EXISTS:
FILTER NOT EXISTS {
?type rdfs:subClassOf+/rdfs:subClassOf/rdfs:subClassOf schema:Thing .
}
Examples
For DrinkAction (level 3):
FILTER NOT EXISTS {
?type
rdfs:subClassOf+ / # schema:ConsumeAction
rdfs:subClassOf / # schema:Action
rdfs:subClassOf schema:Thing .
}
For WearAction (level 4):
FILTER NOT EXISTS {
?type
rdfs:subClassOf+ / # schema:UseAction / schema:ConsumeAction
rdfs:subClassOf / # schema:Action
rdfs:subClassOf schema:Thing .
}

Why are some of the classes in my ontolgoy not being returned in my SPARQL query?

I'm rather confused about why I'm receiving too little results (256 expected, 224 returned). When I run the code below, everything returns exactly as I want it, except that I miss all the classes in my ontology which lie at the highest level, or one below the highest level. I don't understand where I'm being too "strict" in my query so that these classes are not being returned in the table. They also all have parent classes (whether that be the topmost class "top concept" of the ontology, or something like an enumeration type class, either way, the leaf should still be found. Would be thankful for tips or pointers where my code might be inadvertently filtering out those classes.
SELECT DISTINCT ?leaf ?parentclasses
WHERE {
GRAPH <>
#a leaf is the lowest level class: A class that does not have a subclass
{
{
{
#I want anything which is a class
{
?leaf rdf:type owl:Class.
}
#i also want the subclass of any superclass if that exists
{
?leaf rdfs:subClassOf ?superclass .
}
#squeezed to specific section of OTL.
filter strstarts(str(?leaf), "URIgoeshere")
#Only keep the results that do not have a preflabel
OPTIONAL {
?leaf skos:prefLabel ?subclasslabel.
}
#make sure the subclasslabel is in dutch
#filter( langMatches(lang(?subclasslabel),"nl") )
#give me the label of the superclass
OPTIONAL {
?superclass skos:prefLabel ?superclasslabel.
}
#make sure it's in dutch
FILTER (lang(?superclasslabel) = "nl")
#if it exists, give me also the superclass of the superclass creating a supersuperclass
{
?superclass rdfs:subClassOf ?supersuperclass.
#give me the label of the supersuperclass
OPTIONAL {
?supersuperclass skos:prefLabel ?supersuperclasslabel.
}
#make sure it's in dutch
FILTER (lang(?supersuperclasslabel) = "nl")
#keep the leafs that are NOT The values whereby the subclass is not empty. (double negative for removing leafs where the subclass has a subclass below it)
FILTER NOT EXISTS {
?subclass rdfs:subClassOf ?leaf
FILTER (?subclass != owl:Nothing )
}
#concatenate the two parentclass variables into one
BIND(concat(str(?superclasslabel), str("-"), str(?supersuperclasslabel) ) as ?parentclasses)
}
}
}
}
}
Here is a ttl file with the same structure as my database: https://file.io/jjwkAWbK4jrF
Below is my end solution to my problem. It was more complicated than I expected, but it works.
The problem was that these classes that didn't have a parent class were not being accepted by the query. With some union cases this could be covered for 0 parent classes, 1 parent class or 2 parent classes.
SELECT DISTINCT ?leaf ?parentclasses
WHERE {
GRAPH <>
#a leaf is the lowest level class: A class that does not have a subclass
{{{{{
#I want anything which is a class
{?leaf rdf:type owl:Class.}
#squeezed
filter strstarts(str(?leaf), "graph")
#keep the leafs that are NOT The values whereby the subclass is not empty.
(double negative for removing leafs where the subclass has a subclass below it)
FILTER NOT EXISTS {?subclass rdfs:subClassOf ?leaf
FILTER (?subclass != owl:Nothing ) }
}
{
{?leaf rdfs:subClassOf ?superclass .}
#grab dutch label if available
optional {
?superclass skos:prefLabel ?superclassnllabel .
filter( langMatches(lang(?superclassnllabel),"nl") )
}
# take either as the label, but dutch over empty
bind( coalesce( ?superclassnllabel, replace(str(?
superclass),"^[^#]*#", "" ) ) as ?superclasslabel )
{
{?superclass rdfs:subClassOf ?supersuperclass.}
#grab dutch label if available
?supersuperclass skos:prefLabel ?supersuperclassnllabel .
filter( langMatches(lang(?supersuperclassnllabel),"nl") )
# take either as the label, but dutch over empty
bind( coalesce( ?supersuperclassnllabel, replace(str(?
supersuperclass),"^[^#]*#", "" ) ) as ?supersuperclasslabel )
BIND(concat(str(?superclasslabel), str(" - "), str(?
supersuperclasslabel) ) as ?parentclasses)
}
union
{
{?superclass ?p ?o.filter(!isblank(?superclass))}
FILTER NOT EXISTS {?superclass rdfs:subClassOf ?supersuperclass}
BIND(concat(str(?superclasslabel), str(" - "), str("Top
Concept") ) as ?parentclasses)
#concatenate the two parentclass variables into one
}
}
}
union
{
#figure this out, WHY IS IT HERE?
{?leaf rdf:type owl:Class .filter(!isblank(?leaf))}
FILTER strstarts(str(?leaf), "graph")
FILTER NOT EXISTS {?leaf rdfs:subClassOf ?superclass}
FILTER NOT EXISTS {?subclass rdfs:subClassOf ?leaf
FILTER (?subclass != owl:Nothing ) }
BIND (str("Top Class") as ?parentclasses )
}
}}}}

How to convert a class expression (from restriction unionOf) to a string?

A SPARQL query returns a result with restrictions with allValuesFrom and unionOf. I need do concat these values, but, when I use bind or str functions, the result is blank.
I tried bind, str and group_concat functions, but, all of it was unsuccessful. Group_concat return a blank node.
SELECT DISTINCT ?source ?is_succeeded_by
WHERE {
?source rdfs:subClassOf ?restriction .
?restriction owl:onProperty j.0:isSucceededBy .
?restriction owl:allValuesFrom ?is_succeeded_by .
FILTER (REGEX(STR(?source), 'gatw-Invoice_match'))
}
Result of SPARQL query in Protegé:
You can hardly obtain strings like 'xxx or yyy' programmatically in Jena,
since it is Manchester Syntax, an OWL-API native format, and it is not supported by Jena.
Any class expression is actually b-node, there are no such builtin symbols like 'or' in raw RDF.
To represent any anonymous class expression as a string, you can use ONT-API,
which is a jena-based OWL-API, and, therefore, both SPARQL and Manchester Syntax are supported there.
Here is an example based on pizza ontology:
// use pizza, since no example data provided in the question:
IRI pizza = IRI.create("https://raw.githubusercontent.com/owlcs/ont-api/master/src/test/resources/ontapi/pizza.ttl");
// get OWLOntologyManager instance from ONT-API
OntologyManager manager = OntManagers.createONT();
// as extended Jena model:
OntModel model = manager.loadOntology(pizza).asGraphModel();
// prepare query that looks like the original, but for pizza
String txt = "SELECT DISTINCT ?source ?is_succeeded_by\n" +
"WHERE {\n" +
" ?source rdfs:subClassOf ?restriction . \n" +
" ?restriction owl:onProperty :hasTopping . \n" +
" ?restriction owl:allValuesFrom ?is_succeeded_by .\n" +
" FILTER (REGEX(STR(?source), 'Am'))\n" +
"}";
Query q = new Query();
q.setPrefixMapping(model);
q = QueryFactory.parse(q, txt, null, Syntax.defaultQuerySyntax);
// from owlapi-parsers package:
OWLObjectRenderer renderer = new ManchesterOWLSyntaxOWLObjectRendererImpl();
// from ont-api (although it is a part of internal API, it is public):
InternalObjectFactory iof = new SimpleObjectFactory(manager.getOWLDataFactory());
// exec SPARQL query:
try (QueryExecution exec = QueryExecutionFactory.create(q, model)) {
ResultSet res = exec.execSelect();
while (res.hasNext()) {
QuerySolution qs = res.next();
List<Resource> vars = Iter.asStream(qs.varNames()).map(qs::getResource).collect(Collectors.toList());
if (vars.size() != 2)
throw new IllegalStateException("For the specified query and valid OWL must not happen");
// Resource (Jena) -> OntCE (ONT-API) -> ONTObject (ONT-API) -> OWLClassExpression (OWL-API)
OWLClassExpression ex = iof.getClass(vars.get(1).inModel(model).as(OntClass.class)).getOWLObject();
// format: 'class local name' ||| 'superclass string in ManSyn'
System.out.println(vars.get(0).getLocalName() + " ||| " + renderer.render(ex));
}
}
The output:
American ||| MozzarellaTopping or PeperoniSausageTopping or TomatoTopping
AmericanHot ||| HotGreenPepperTopping or JalapenoPepperTopping or MozzarellaTopping or PeperoniSausageTopping or TomatoTopping
Used env: ont-api:2.0.0, owl-api:5.1.11, jena-arq:3.13.1

Searching semantically tagged documents in MarkLogic

Can any one please point me to some simple examples of semantic tagging and querying semantically tagged documents in MarkLogic?
I am fairly new in this area,so some beginner level examples will do.
When you say "semantically tagged" do you mean regular XML documents that happen to have some triples in them? The discussion and examples at http://docs.marklogic.com/guide/semantics/embedded are pretty good for that.
Start by enabling the triple index in your database. Then insert a test doc. This is just XML, but the sem:triple element represents a semantic fact.
xdmp:document-insert(
'test.xml',
<test>
<source>AP Newswire</source>
<sem:triple date="1972-02-21" confidence="100">
<sem:subject>http://example.org/news/Nixon</sem:subject>
<sem:predicate>http://example.org/wentTo</sem:predicate>
<sem:object>China</sem:object>
</sem:triple>
</test>)
Then query it. The example query is pretty complicated. To understand what's going on I'd insert variations on that sample document, using different URIs instead of just test.xml, and see how the various query terms match up. Try using just the SPARQL component, without the extra cts query. Try cts:search with no SPARQL, just the cts:query.
xquery version "1.0-ml";
import module namespace sem = "http://marklogic.com/semantics"
at "/MarkLogic/semantics.xqy";
sem:sparql('
SELECT ?country
WHERE {
<http://example.org/news/Nixon> <http://example.org/wentTo> ?country
}
',
(),
(),
cts:and-query((
cts:path-range-query( "//sem:triple/#confidence", ">", 80) ,
cts:path-range-query( "//sem:triple/#date", "<", xs:date("1974-01-01")),
cts:or-query((
cts:element-value-query( xs:QName("source"), "AP Newswire"),
cts:element-value-query( xs:QName("source"), "BBC"))))))
In case you are talking about enriching your content using semantic technology, that is not directly provided by MarkLogic.
You can enrich your content externally, for instance by calling a public service like the one provided by OpenCalais, and then insert the enrichments to the content before insert.
You can also build lists of lookup values, and then using cts:highlight to mark such terms within your content. That could be as simple as:
let $labels := ("MarkLogic", "StackOverflow")
return
cts:highlight($doc, cts:word-query($labels), <b>{$cts:text}</b>)
Or with a more dynamic replacement using spraql:
let $labels := map:new()
let $_ :=
for $result in sem:sparql('
PREFIX demo: <http://www.marklogic.com/ontologies/demo#>
SELECT DISTINCT ?label
WHERE {
?s a demo:person.
{
?s demo:fullName ?label
} UNION {
?s demo:initialsName ?label
} UNION {
?s demo:email ?label
}
}
')
return
map:put($labels, map:get($result, 'label'), 'person')
return
cts:highlight($doc, cts:word-query(map:keys($labels)),
let $result := sem:sparql(concat('
PREFIX demo: <http://www.marklogic.com/ontologies/demo#>
SELECT DISTINCT ?s ?p
{
?s a demo:', map:get($labels, $cts:text), ' .
?s ?p "', $cts:text, '" .
}
'))
return
if (map:contains($labels, $cts:text))
then
element { xs:QName(fn:concat("demo:", map:get($labels, $cts:text))) } {
attribute subject { map:get($result, 's') },
attribute predicate { map:get($result, 'p') },
$cts:text
}
else ()
)
HTH!

ARQ Making a Query from Scratch

I have a problem in building the query from scratch syntactically or in algebra, based on
https://jena.apache.org/documentation/query/manipulating_sparql_using_arq.html
For example I have the below query
SELECT (count(?instance) AS ?count)
WHERE
{ ?instance <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://data.linkedmdb.org/resource/movie/film> }
(project (?count)
(extend ((?count ?.0))
(group () ((?.0 (count ?instance)))
(bgp (triple ?instance <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://data.linkedmdb.org/resource/movie/film>)))))
Can any one direct me with a sample code of how to build the above query from scratch?
I have tried to build it syntactically but failing to know about how to alias the aggregation above.
If anybody can at least guide me in including aggregation with its aliasing name in projection it will be very great.
I don't typically construct queries through code, since I can just parse a query string, or use a parameterized SPARQL query, but here's a reconstruction of your query using the API. Most of the methods I used here I found by exploring the autocomplete options in Eclipse, and by looking at the Javadoc.
import com.hp.hpl.jena.graph.Node;
import com.hp.hpl.jena.graph.Triple;
import com.hp.hpl.jena.query.Query;
import com.hp.hpl.jena.query.QueryFactory;
import com.hp.hpl.jena.sparql.core.Var;
import com.hp.hpl.jena.sparql.expr.ExprAggregator;
import com.hp.hpl.jena.sparql.expr.ExprVar;
import com.hp.hpl.jena.sparql.expr.aggregate.AggCountVar;
import com.hp.hpl.jena.sparql.syntax.ElementTriplesBlock;
import com.hp.hpl.jena.vocabulary.RDF;
public class QueryBuilding {
public static void main(String[] args) {
// Create the query and make it a SELECT query.
final Query query = QueryFactory.create();
query.setQuerySelectType();
// Set the projection expression.
final ExprVar instance = new ExprVar( "instance" );
query.getProject().add( Var.alloc( "count" ), new ExprAggregator( instance.asVar(), new AggCountVar( instance )));
// Construct the triples pattern and add it.
final ElementTriplesBlock triples = new ElementTriplesBlock();
final Node film = Node.createURI( "http://data.linkedmdb.org/resource/movie/film" );
triples.addTriple( new Triple( instance.getAsNode(), RDF.type.asNode(), film ));
query.setQueryPattern( triples );
// Show the query
System.out.println( query );
}
}
The output (i.e., the printed query) follows. It's the same as your query, modulo some whitespace location and newlines.
SELECT (count(?instance) AS ?count)
WHERE
{ ?instance <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://data.linkedmdb.org/resource/movie/film> .}
Although the solution proposed by Joshua was very helpful to me and produces the correct String output, I have found that it contains a problem; the line :
query.getProject().add( Var.alloc( "count" ), new ExprAggregator( instance.asVar(), new AggCountVar( instance )));
Should be replaced by :
query.getProject().add( Var.alloc( "count" ), query.allocAggregate( new AggCountVar( instance ) ));
Otherwise if you execute the query against a Model you will get an exception "NotAVariableException: Node_variable (not a Var) found"