ARQ Making a Query from Scratch - sparql

I have a problem in building the query from scratch syntactically or in algebra, based on
https://jena.apache.org/documentation/query/manipulating_sparql_using_arq.html
For example I have the below query
SELECT (count(?instance) AS ?count)
WHERE
{ ?instance <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://data.linkedmdb.org/resource/movie/film> }
(project (?count)
(extend ((?count ?.0))
(group () ((?.0 (count ?instance)))
(bgp (triple ?instance <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://data.linkedmdb.org/resource/movie/film>)))))
Can any one direct me with a sample code of how to build the above query from scratch?
I have tried to build it syntactically but failing to know about how to alias the aggregation above.
If anybody can at least guide me in including aggregation with its aliasing name in projection it will be very great.

I don't typically construct queries through code, since I can just parse a query string, or use a parameterized SPARQL query, but here's a reconstruction of your query using the API. Most of the methods I used here I found by exploring the autocomplete options in Eclipse, and by looking at the Javadoc.
import com.hp.hpl.jena.graph.Node;
import com.hp.hpl.jena.graph.Triple;
import com.hp.hpl.jena.query.Query;
import com.hp.hpl.jena.query.QueryFactory;
import com.hp.hpl.jena.sparql.core.Var;
import com.hp.hpl.jena.sparql.expr.ExprAggregator;
import com.hp.hpl.jena.sparql.expr.ExprVar;
import com.hp.hpl.jena.sparql.expr.aggregate.AggCountVar;
import com.hp.hpl.jena.sparql.syntax.ElementTriplesBlock;
import com.hp.hpl.jena.vocabulary.RDF;
public class QueryBuilding {
public static void main(String[] args) {
// Create the query and make it a SELECT query.
final Query query = QueryFactory.create();
query.setQuerySelectType();
// Set the projection expression.
final ExprVar instance = new ExprVar( "instance" );
query.getProject().add( Var.alloc( "count" ), new ExprAggregator( instance.asVar(), new AggCountVar( instance )));
// Construct the triples pattern and add it.
final ElementTriplesBlock triples = new ElementTriplesBlock();
final Node film = Node.createURI( "http://data.linkedmdb.org/resource/movie/film" );
triples.addTriple( new Triple( instance.getAsNode(), RDF.type.asNode(), film ));
query.setQueryPattern( triples );
// Show the query
System.out.println( query );
}
}
The output (i.e., the printed query) follows. It's the same as your query, modulo some whitespace location and newlines.
SELECT (count(?instance) AS ?count)
WHERE
{ ?instance <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://data.linkedmdb.org/resource/movie/film> .}

Although the solution proposed by Joshua was very helpful to me and produces the correct String output, I have found that it contains a problem; the line :
query.getProject().add( Var.alloc( "count" ), new ExprAggregator( instance.asVar(), new AggCountVar( instance )));
Should be replaced by :
query.getProject().add( Var.alloc( "count" ), query.allocAggregate( new AggCountVar( instance ) ));
Otherwise if you execute the query against a Model you will get an exception "NotAVariableException: Node_variable (not a Var) found"

Related

How to filter distinct regex matches with SPARQL?

I am querying a (kind of) bibliographic database and would like to find all the distinct matches of a certain regex (matching the signature of typescripts (TS) and manuscripts (MS)); i.e. I would like to return all documents that are currently in the database.
I came up with
SELECT ?document
WHERE
{
{
?documentURI a witt:MS;
rdfs:label ?document.
}
UNION
{
?documentURI a witt:TS;
rdfs:label ?document.
}
FILTER (regex(?document, "(Ms|Ts)\\-((1|2|3)\\d{2}\\w?\\d?)"))
}
(endpoint); this returns all the signatures but I would like to filter the result for the distinct regex matches, i.e. the distinct signatures up to and excluding the comma.
How can this be achieved?
Ok, I think I found a solution with strbefore:
SELECT DISTINCT ?document
WHERE
{
{
?documentURI a witt:MS;
rdfs:label ?documentFull.
}
UNION
{
?documentURI a witt:TS;
rdfs:label ?documentFull.
}
BIND (strbefore(?documentFull, ",") AS ?document)
}
Try it.
Would appreciate opinions on the query though, is this effective/good style?

With PHRETS v2, Can I Sort the Data Fields Alphabetically? (Had this when using V1)

This code works in terms of retrieving data:
<?php
date_default_timezone_set('America/Phoenix');
require_once("composer/vendor/autoload.php");
$config = new \PHRETS\Configuration;
$config->setLoginUrl('my_url')
->setUsername('my_user')
->setPassword('my_pass')
->setRetsVersion('1.7.2');
$rets = new \PHRETS\Session($config);
$connect = $rets->Login();
$system = $rets->GetSystemMetadata();
echo "Server Name: " . $system->getSystemDescription();
$property_classes = ['Property'];
foreach ($property_classes as $pc) {
// generate the DMQL query
$query = "(BedroomsTotal=1+),(MlsStatus=ACT,PND)";
$results = $rets->Search('Property', $pc, $query);
file_put_contents('MyFolder/Property_' . $pc . '.csv', $results->toCSV());
} //end for each property class
php?>
I would like to know how to sort the fields alphabetically, in order to keep fields in a predicable order, which could also be used in an SQL CREATE TABLE statement. I had this ability with v1.
I would also like to be able to loop through the data fields with a FOR EACH kind of statement, in order to create a customized field delimiter; a custom delimiter helps with avoiding import errors in cases where the delimiter also appears within the metadata, such as quotes and commas within remarks section.
Any help is much appreciated. :)

Searching semantically tagged documents in MarkLogic

Can any one please point me to some simple examples of semantic tagging and querying semantically tagged documents in MarkLogic?
I am fairly new in this area,so some beginner level examples will do.
When you say "semantically tagged" do you mean regular XML documents that happen to have some triples in them? The discussion and examples at http://docs.marklogic.com/guide/semantics/embedded are pretty good for that.
Start by enabling the triple index in your database. Then insert a test doc. This is just XML, but the sem:triple element represents a semantic fact.
xdmp:document-insert(
'test.xml',
<test>
<source>AP Newswire</source>
<sem:triple date="1972-02-21" confidence="100">
<sem:subject>http://example.org/news/Nixon</sem:subject>
<sem:predicate>http://example.org/wentTo</sem:predicate>
<sem:object>China</sem:object>
</sem:triple>
</test>)
Then query it. The example query is pretty complicated. To understand what's going on I'd insert variations on that sample document, using different URIs instead of just test.xml, and see how the various query terms match up. Try using just the SPARQL component, without the extra cts query. Try cts:search with no SPARQL, just the cts:query.
xquery version "1.0-ml";
import module namespace sem = "http://marklogic.com/semantics"
at "/MarkLogic/semantics.xqy";
sem:sparql('
SELECT ?country
WHERE {
<http://example.org/news/Nixon> <http://example.org/wentTo> ?country
}
',
(),
(),
cts:and-query((
cts:path-range-query( "//sem:triple/#confidence", ">", 80) ,
cts:path-range-query( "//sem:triple/#date", "<", xs:date("1974-01-01")),
cts:or-query((
cts:element-value-query( xs:QName("source"), "AP Newswire"),
cts:element-value-query( xs:QName("source"), "BBC"))))))
In case you are talking about enriching your content using semantic technology, that is not directly provided by MarkLogic.
You can enrich your content externally, for instance by calling a public service like the one provided by OpenCalais, and then insert the enrichments to the content before insert.
You can also build lists of lookup values, and then using cts:highlight to mark such terms within your content. That could be as simple as:
let $labels := ("MarkLogic", "StackOverflow")
return
cts:highlight($doc, cts:word-query($labels), <b>{$cts:text}</b>)
Or with a more dynamic replacement using spraql:
let $labels := map:new()
let $_ :=
for $result in sem:sparql('
PREFIX demo: <http://www.marklogic.com/ontologies/demo#>
SELECT DISTINCT ?label
WHERE {
?s a demo:person.
{
?s demo:fullName ?label
} UNION {
?s demo:initialsName ?label
} UNION {
?s demo:email ?label
}
}
')
return
map:put($labels, map:get($result, 'label'), 'person')
return
cts:highlight($doc, cts:word-query(map:keys($labels)),
let $result := sem:sparql(concat('
PREFIX demo: <http://www.marklogic.com/ontologies/demo#>
SELECT DISTINCT ?s ?p
{
?s a demo:', map:get($labels, $cts:text), ' .
?s ?p "', $cts:text, '" .
}
'))
return
if (map:contains($labels, $cts:text))
then
element { xs:QName(fn:concat("demo:", map:get($labels, $cts:text))) } {
attribute subject { map:get($result, 's') },
attribute predicate { map:get($result, 'p') },
$cts:text
}
else ()
)
HTH!

Jena Text query performance slows down dramatically with large dataset

I am working on querying from a RDF dataset of 2.37 GB with approx 17 million triples in it and lucence index of the dataset is also maintained. I tried text queries of jena-text module which search's on the basis of stored lucene indexes. But its performance is quite slow, it is taking 4 or more seconds for a search query which is very slow.
However when I use luncene index viewer 'luke'. Indexes seems to have no problem and when I search for a particular term in from indexes it takes few miliseconds to search it.
So the problem is that I am unable to recognize that why is it taking so much time when it comes to 'jena-texr'.
Following is the sparql query:
SELECT ?subj ?status ?version ?label
WHERE {
?subj rdf:type ts:Valueset;
text:query 'cancer';
ts:entityStatus ?status;
OPTIONAL { ?subj ts:versionID ?version . } .
OPTIONAL { ?subj rdfs:label ?label . } .
}
LIMIT <limit>
OFFSET <offset>
Here is the jena code:
store.getDataset().begin(ReadWrite.READ) ;
Query query = QueryFactory.create(queryStr);
QueryExecution qexec = QueryExecutionFactory.create(query , store.getDataset()) ;
ResultSet results = qexec.execSelect();
while(results.hasNext()){
QuerySolution qs = results.next();
And Here is the code for creating indexed dataset.
Dataset baseDS = TDBFactory.createDataset(storePath.trim());
//define index mapping
EntityDefinition entityDef = new EntityDefinition("uri", "property", RDFS.label.asNode());
entityDef.set("property", TS.conceptCode.asNode());
entityDef.set("property", SKOS_XL.literalForm.asNode());
entityDef.set("property", SKOS.note.asNode());
entityDef.set("property", SKOS.definition.asNode());
//create in file lucene
File indexDir = new File(textIndexPath);
Directory luceneDir = null;
try {
luceneDir = FSDirectory.open(indexDir);
} catch (IOException e) {
e.printStackTrace();
}
// Join together into a dataset
Dataset indexedDS = TextDatasetFactory.createLucene(baseDS, luceneDir, entityDef) ;
Kindly can anyone identify if there is any problem with the code and the way indexed dataset is configured. Thanks
It seems this is a known issue, I am having problems with it too :(
https://issues.apache.org/jira/browse/JENA-999

How to parse elements from Sparql algebra

I can get the algebra form from sparql query string using ARQ algebra (com.hp.hpl.jena.sparql.algebra):
String queryStr =
"PREFIX foaf: <http://xmlns.com/foaf/0.1/>" +
"SELECT DISTINCT ?name ?nick" +
"{?x foaf:mbox <mailt:person#server> ." +
"?x foaf:name ?name" +
"OPTIONAL { ?x foaf:nick ?nick }}";
Query query = QueryFactory.create(queryStr);
Op op = Algebra.compile(query);
Print the returned value of op:
(distinct
(project (?name ?nick)
(join
(bgp
(triple ?x <http://xmlns.com/foaf/0.1/mbox> <mailt:person#server>)
(triple ?x <http://xmlns.com/foaf/0.1/name> ?nameOPTIONAL)
)
(bgp (triple ?x <http://xmlns.com/foaf/0.1/nick> ?nick)))))
Returned value is an Op type, but I can't find any direct methods that can parse the op into elements, e.g., basic graph patterns of s, p, o, and the relations between these graph patterns.
Any hint is appreciated, thanks.
Why serialise out the algebra at all?
If your aim is to walk the algebra tree and extract the BGPs then you can do this using the OpVisitor interface of which there are various implementations of that will get you started. The particular method you would care about is visit(OpBgp opBgp) since then you can access the methods of the OpBgp class to extract the pattern information
It might be too late but still as I am not seeing final answer so I am writing the code for printing all BGP.
For that create the class as follows that extends OpVisitorBase and override the public void visit(final OpBGP) function as given below in the code.
And from your code simply call the function:
MyOpVisitorBase.myOpVisitorWalker(op);
public class MyOpVisitorBase extends OpVisitorBase
{
public static void myOpVisitorWalker(Op op)
{
OpWalker.walk(op, this);
}
#Override
public void visit(final OpBGP opBGP) {
final List<Triple> triples = opBGP.getPattern().getList();
int i = 0;
for (final Triple triple : triples) {
System.out.println("Triple: "+triple.toString());
}
}
}