Does Jena TDB load all data into memory every time? - sparql

I am a newbie of Jena. I try to deal with the Yoga dataset using TDB. The dataset is about 200M and everytime I run the same query, it will have to take about 5 minutes to load the data then give out the results. I am wondering do I misunderstand any part of TDB? The following are my codes.
String directory = "tdb";
Dataset dataset = TDBFactory.createDataset(directory);
dataset.begin(ReadWrite.WRITE);
Model tdb = dataset.getDefaultModel();
//String source = "yagoMetaFacts.ttl";
//FileManager.get().readModel(tdb, source);
String queryString = "SELECT DISTINCT ?p WHERE { ?s ?p ?o. }";
Query query = QueryFactory.create(queryString);
try(QueryExecution qexec = QueryExecutionFactory.create(query, tdb)){
ResultSet results = qexec.execSelect();
ResultSetFormatter.out(System.out, results, query) ;
}
dataset.commit();
dataset.end();

There are two ways to load data into tdb, either by API or CMD. Much thanks to #ASKW and #AndyS
1 Load data via API
These codes need to be executed only once especially the readModel line which will takes long time.
String directory = "tdb";
Dataset dataset = TDBFactory.createDataset(directory);
dataset.begin(ReadWrite.WRITE);
Model tdb = dataset.getDefaultModel();
String source = "yagoMetaFacts.ttl";
FileManager.get().readModel(tdb, source);
dataset.commit(); //Important!! This is to commit the data to tdb.
dataset.end();
After the data is loaded into tdb, we can use following codes to query. And it is not necessary to load data again.
String directory = "path\\to\\tdb";
Dataset dataset = TDBFactory.createDataset(directory);
Model tdb = dataset.getDefaultModel();
String queryString = "SELECT DISTINCT ?p WHERE { ?s ?p ?o. }";
Query query = QueryFactory.create(queryString);
try(QueryExecution qexec = QueryExecutionFactory.create(query, tdb)){
ResultSet results = qexec.execSelect();
ResultSetFormatter.out(System.out, results, query) ;
}
2 Load data via CMD
To load data
>tdbloader --loc=path\to\tdb path\to\dataset.ttl
To query
>tdbquery --loc=path\to\tdb --query=q1.rq
q1.rq is the file which stores the query
Should get results like this
-------------------------------------------------------
| p |
=======================================================
| <http://yago-knowledge.org/resource/hasGloss> |
| <http://yago-knowledge.org/resource/occursSince> |
| <http://yago-knowledge.org/resource/occursUntil> |
| <http://yago-knowledge.org/resource/byTransport> |
| <http://yago-knowledge.org/resource/hasPredecessor> |
| <http://yago-knowledge.org/resource/hasSuccessor> |
| <http://www.w3.org/2000/01/rdf-schema#comment> |
-------------------------------------------------------

Related

How do i extract the fragment value, the bit after #, in a SPARQL query

I wrote the following jena query in Eclipse which produces a URI and description of a medical disorder pertaining to the inner ear.
public static void main(String[] args) {
String FOAF = "http://http://xmlns.com/foaf/0.1/";
String NS = "http://philshields.altervista.org/owl/_";
String rdf= "http://www.w3.org/1999/02/22-rdf-syntax-ns#";
String owl= "http://www.w3.org/2002/07/owl#" ;
String xsd= "http://www.w3.org/2001/XMLSchema#";
String rdfs= "http://www.w3.org/2000/01/rdf-schema#";
String dc= "http://purl.org/dc/elements/1.1/";
Model model = ModelFactory.createOntologyModel( OntModelSpec.OWL_MEM );
//model = FileManager.get().loadModel("c:/Ontologies/ICD-10 ontologies/2017_10_26_ICD10_AM_Code Order.owl");
model = FileManager.get().loadModel("c:/jena/ICD.owl");
String resultsAsString;
String queryString = "PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>\r\n" +
"PREFIX owl: <http://www.w3.org/2002/07/owl#>\r\n" +
"PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>\r\n" +
"PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>\r\n" +
"PREFIX dc: <http://purl.org/dc/elements/1.1/>\r\n" +
"SELECT (str(?subject) AS ?string) ?object\r\n" +
" WHERE {?subject dc:title ?object \r\n" +
"FILTER regex(?object,\"inner ear\",\"i\")}\r\n" +
"" ;
Query query = QueryFactory.create(queryString) ;
try (QueryExecution qexec = QueryExecutionFactory.create(query, model)) {
ResultSet results = qexec.execSelect() ;
for ( ; results.hasNext() ; )
{
//QuerySolution soln = results.nextSolution() ;
ResultSetFormatter.out(System.out, results, query) ;
}
}
}
It produces the correct results shown here:
-------------------------------------------------------------------------------------------------------------------------------------
| string | object |
=====================================================================================================================================
| "http://www.semanticweb.org/philshields/ontologies/2017/8/untitled-ontology-341#H83.3" | "Noise effects on inner ear" |
| "http://www.semanticweb.org/philshields/ontologies/2017/8/untitled-ontology-341#S01.38" | "Open wound of inner ear" |
| "http://www.semanticweb.org/philshields/ontologies/2017/8/untitled-ontology-341#H83.8" | "Other specified diseases of inner ear" |
| "http://www.semanticweb.org/philshields/ontologies/2017/8/untitled-ontology-341#H83.9" | "Disease of inner ear, unspecified" |
I am trying to discard all of the URI and keep the fragment value which is the code number of the disorder after the hash.
I tried using SELECT (str(?subject) AS ?string) ?object, but it does not work.
SPARQL way:
SELECT (strafter(str(?subject), "#") as ?fragment) WHERE
{
#... Something...
}
JAVA way:
Resource subject = // Some way to fetch from your query;
String fragment = resource.getlocalName(); // This will return fragment

Simple SPARQL query does not return any results

I am just getting up and running with Blazegraph in embedded mode. I load a few sample triples and am able to retrieve them with a "select all" query:
SELECT * WHERE { ?s ?p ?o }
This query returns all my sample triples:
[s=<<<http://github.com/jschmidt10#person_Thomas>, <http://github.com/jschmidt10#hasAge>, "30"^^<http://www.w3.org/2001/XMLSchema#int>>>;p=blaze:history:added;o="2017-01-15T16:11:15.909Z"^^<http://www.w3.org/2001/XMLSchema#dateTime>]
[s=<<<http://github.com/jschmidt10#person_Tommy>, <http://github.com/jschmidt10#hasLastName>, "Test">>;p=blaze:history:added;o="2017-01-15T16:11:15.909Z"^^<http://www.w3.org/2001/XMLSchema#dateTime>]
[s=<<<http://github.com/jschmidt10#person_Tommy>, <http://www.w3.org/2002/07/owl#sameAs>, <http://github.com/jschmidt10#person_Thomas>>>;p=blaze:history:added;o="2017-01-15T16:11:15.909Z"^^<http://www.w3.org/2001/XMLSchema#dateTime>]
[s=<http://github.com/jschmidt10#person_Thomas>;p=<http://github.com/jschmidt10#hasAge>;o="30"^^<http://www.w3.org/2001/XMLSchema#int>]
[s=<http://github.com/jschmidt10#person_Tommy>;p=<http://github.com/jschmidt10#hasLastName>;o="Test"]
[s=<http://github.com/jschmidt10#person_Tommy>;p=<http://www.w3.org/2002/07/owl#sameAs>;o=<http://github.com/jschmidt10#person_Thomas>]
Next I try a simple query for a particular subject:
SELECT * WHERE { <http://github.com/jschmidt10#person_Thomas> ?p ?o }
This query yields no results. It seems that none of my queries for a URI are working. I am able to get results when I query for a literal (e.g. ?s ?p "Test").
The API I am using to create my query is BigdataSailRepositoryConnection.prepareQuery().
Code snippet (Scala) that executes and generates the query:
val props = BasicRepositoryProvider.getProperties("./graph.jnl")
val sail = new BigdataSail(props)
val repo = new BigdataSailRepository(sail)
repo.initialize()
val query = "SELECT ?p ?o WHERE { <http://github.com/jschmidt10#person_Thomas> ?p ?o }"
val cxn = repo.getConnection
cxn.begin()
var res = cxn.
prepareTupleQuery(QueryLanguage.SPARQL, query).
evaluate()
while (res.hasNext) println(res.next)
cxn.close()
repo.shutDown()
Have you checked the way you filled the database? You might have characters that are getting encoded strangely, or it looks like you might have excess brackets in your objects.
From the print statement, your URI's are printing extra angled brackets. You are likely using:
val subject = valueFactory.createURI("<http://some.url/some/entity>")
when you should be doing this (without angled brackets):
val subject = valueFactory.createURI("http://some.url/some/entity")

Jena Text query performance slows down dramatically with large dataset

I am working on querying from a RDF dataset of 2.37 GB with approx 17 million triples in it and lucence index of the dataset is also maintained. I tried text queries of jena-text module which search's on the basis of stored lucene indexes. But its performance is quite slow, it is taking 4 or more seconds for a search query which is very slow.
However when I use luncene index viewer 'luke'. Indexes seems to have no problem and when I search for a particular term in from indexes it takes few miliseconds to search it.
So the problem is that I am unable to recognize that why is it taking so much time when it comes to 'jena-texr'.
Following is the sparql query:
SELECT ?subj ?status ?version ?label
WHERE {
?subj rdf:type ts:Valueset;
text:query 'cancer';
ts:entityStatus ?status;
OPTIONAL { ?subj ts:versionID ?version . } .
OPTIONAL { ?subj rdfs:label ?label . } .
}
LIMIT <limit>
OFFSET <offset>
Here is the jena code:
store.getDataset().begin(ReadWrite.READ) ;
Query query = QueryFactory.create(queryStr);
QueryExecution qexec = QueryExecutionFactory.create(query , store.getDataset()) ;
ResultSet results = qexec.execSelect();
while(results.hasNext()){
QuerySolution qs = results.next();
And Here is the code for creating indexed dataset.
Dataset baseDS = TDBFactory.createDataset(storePath.trim());
//define index mapping
EntityDefinition entityDef = new EntityDefinition("uri", "property", RDFS.label.asNode());
entityDef.set("property", TS.conceptCode.asNode());
entityDef.set("property", SKOS_XL.literalForm.asNode());
entityDef.set("property", SKOS.note.asNode());
entityDef.set("property", SKOS.definition.asNode());
//create in file lucene
File indexDir = new File(textIndexPath);
Directory luceneDir = null;
try {
luceneDir = FSDirectory.open(indexDir);
} catch (IOException e) {
e.printStackTrace();
}
// Join together into a dataset
Dataset indexedDS = TextDatasetFactory.createLucene(baseDS, luceneDir, entityDef) ;
Kindly can anyone identify if there is any problem with the code and the way indexed dataset is configured. Thanks
It seems this is a known issue, I am having problems with it too :(
https://issues.apache.org/jira/browse/JENA-999

Simple Jena SPARQL query not working

What am I doing wrong here?
public class SimpleSearchTest {
public static void main(String[] args) throws Exception {
Model model = ModelFactory.createDefaultModel();
model.getGraph().add(new Triple(Node.createURI("a"), Node.createURI("b"), Node.createURI("c")));
String queryString = "SELECT ?p ?o WHERE { <a> ?p ?o }";
Query query = QueryFactory.create(queryString);
QueryExecution qExec = QueryExecutionFactory.create(query, model);
ResultSetFormatter.out(qExec.execSelect());
}
}
I am expecting
-------------
| p | o |
=============
| <b> | <c> |
-------------
But instead I am getting no results:
---------
| p | o |
=========
---------
I am sure it is something dumb...
I think the SPARQL parser isn't liking your <a> because it's not a legal URI (though it's odd that you don't get a warning). If you change your code as follows:
model.getGraph().add(new Triple(Node.createURI("http://example.com/a"), Node.createURI("b"), Node.createURI("c")));
String queryString = "SELECT ?p ?o WHERE { <http://example.com/a> ?p ?o }";
you get the result you are expecting.
Parenthetically, by creating the test graph with Node.createURI() you are using the lower-level internal Graph API, rather than the more normally used Model API. It's perfectly fine to do this, but the Graph API generally assumes you know more what you are doing, and may have fewer checks against doing the unexpected.

Sparql Query With Inferencing

i have some rdf & rdfs files and i want to use jena sparql implementation to query it and my code look like :
//model of my rdf file
Model model = ModelFactory.createMemModelMaker().createDefaultModel();
model.read(inputStream1, null);
//model of my ontology (word net) file
Model onto = ModelFactory.createOntologyModel( OntModelSpec.RDFS_MEM_RDFS_INF);
onto.read( inputStream2,null);
String queryString =
"PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#> "
+ "PREFIX wn:<http://www.webkb.org/theKB_terms.rdf/wn#> "
+ "SELECT ?person "
+ "WHERE {"
+ " ?person rdf:type wn:Person . "
+ " }";
Query query = QueryFactory.create(queryString);
QueryExecution qe = QueryExecutionFactory.create(query, ????);
ResultSet results = qe.execSelect();
ResultSetFormatter.out(System.out, results, query);
qe.close();
and i have a wordNet Ontology in rdf file and i want to use this ontology in my query to do Inferencing automaticly (when i query for person the query should return eg. Man ,Woman)
so how to link the ontology to my query? please help me.
update: now i have tow models : from which i should run my query ?
QueryExecution qe = QueryExecutionFactory.create(query, ????);
thanks in advance.
The key is to recognise that, in Jena, Model is the one of the central abstractions. An inferencing model is just a Model, in which some of the triples are present because they are entailed by inference rules rather than read in from the source document. Thus you only need to change the first line of your example, where you create the model initially.
While you can create inference models directly, it's often easiest just to create an OntModel with the required degree of inference support:
Model model = ModelFactory.createOntologyModel( OntModelSpec.RDFS_MEM_RDFS_INF );
If you want a different reasoner, or OWL support, you can select a different OntModelSpec constant. Be aware that large and/or complex models can make for slow queries.
Update (following edit of original question)
To reason over two models, you want the union. You can do this through OntModel's sub-model factility. I would change your example as follows (note: I haven't tested this code, but it should work):
String rdfFile = "... your RDF file location ...";
Model source = FileManager.get().loadModel( rdfFile );
String ontFile = "... your ontology file location ...";
Model ont = FileManager.get().loadModel( ontFile );
Model m = ModelFactory.createOntologyModel( OntModelSpec.RDFS_MEM_RDFS_INF, ont );
m.addSubModel( source );
String queryString =
"PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#> "
+ "PREFIX wn:<http://www.webkb.org/theKB_terms.rdf/wn#> "
+ "SELECT ?person "
+ "WHERE {"
+ " ?person rdf:type wn:Person . "
+ " }";
Query query = QueryFactory.create(queryString);
QueryExecution qe = QueryExecutionFactory.create(query, m);
ResultSet results = qe.execSelect();
ResultSetFormatter.out(System.out, results, query);
qe.close();