Jena StmtIterator and database

Jena StmtIterator and database - sparql

I have my model stored in a triple store(persistence).
I want to select all individuals related by some document name.
I can do it in 2 ways
1) SPARQL request:
PREFIX base:<http://example#>
select ?s2 ?p2 ?o2
where {
{?doc base:fullName <file:/c:/1.txt>; ?p1 ?o1
} UNION {
?s2 ?p2 ?o2 ;
base:documentID ?doc }
}
Question: How to create Jena's Model from the ResultSet?
2) StmtIterator stmtIterator = model.listStatements(...)
The problem with this approach that I need to use model.listStatements(...) operation for the several times :
a) get document URI by document name
b) get a list of individuals ID related to this document URI
c) finally - get a collection of individuals
I concern about performance - 3 times run model.listStatements(...) - many database requests.
Or all data are read into memory(I doubt about it) from the database during model creation:
Dataset ds = RdfStoreFactory.connectDataset(store, conn);
GraphStore graphStore = GraphStoreFactory.create(ds) ;
?

You need to back up a little bit and think more clearly about what you are trying to do. Your sparql query, once it's corrected (see below), will do a perfectly good job of producing an iterator over the resultset, which will provide you with the properties of each of the documents you're looking for. Specifically, you get one set of bindings for each of s2, p2 and o2 for each value in the resultset. That's what you ask for when you specify select ?s2 ?p2 ?o2. And it's normally what you want: usually, we select some values out of the triple store in order to process them in some way (e.g. rendering them into a list on the UI) and for that we exactly want an iterator over the results. You can have the query return you a model not a resultset, by virtue of a SPARQL construct query or SPARQL describe. However, you then have a need to iterate over the resources in the model, so you aren't much further forward (except that your model is smaller and in-memory).
Your query, incidentally, can be improved. The variables p1 and o1 make the query engine do useless work since you never use them, and there's no need for a union. Corrected, your query should be:
PREFIX base:<http://example#>
select ?s2 ?p2 ?o2
where {
?doc base:fullName <file:/c:/1.txt> .
?s2 base:documentID ?doc ;
?p2 ?o2 .
}
To execute any query, select, describe or construct, from Java see the Jena documentation.
You can efficiently achieve the same results as your query using the model API. For example, (untested):
Model m = ..... ; // your model here
String baseNS = "http://example#";
Resource fileName = m.createResource( "file:/c:/1.txt" );
// create RDF property objects for the properties we need. This can be done in
// a vocab class, or automated with schemagen
Property fullName = m.createProperty( baseNS + "fullName" );
Property documentID = m.createProperty( baseNS + "documentID" );
// find the ?doc with the given fullName
for (ResIterator i = m.listSubjectsWithProperty( fullName, fileName ); i.hasNext(); ) {
Resource doc = i.next();
// find all of the ?s2 whose documentID is ?doc
for (StmtIterator j = m.listStatements( null, documentID, doc ); j.hasNext(); ) {
Resource s2 = j.next().getSubject();
// list the properties of ?s2
for (StmtIterator k = s2.listProperties(); k.hasNext(); ) {
Statement stmt = k.next();
Property p2 = stmt.getPredicate();
RDFNode o2 = stmt.getObject();
// do something with s2 p2 o2 ...
}
}
}
Note that your schema design makes this more complex that it needs to be. If, for example, the document full name resource had a base:isFullNameOf property, then you could simply do a lookup to get the doc. Similarly, it's not clear why you need to distinguish between doc and s2: why not simply have the document properties attached to the doc resource?
Finally: no, opening a database connection does not load the entire DB into memory. However, TDB in particular does make extensive use of caching of regions of the graph in order to make queries more efficient.

Related

Retrieve only one label from set of individuals with multiple labels

Suppose I have the following set of individuals, where some of them have more than one rdfs:label:
Individual
Label
ind_01
"ind-001"
ind_02
"ind-002"
ind_02
"ind-2"
ind_03
"label3"
ind_04
"ind-4"
ind_04
"ind-04"
...
...
I would like to run a SPARQL query that retrieves only one label per individual, no matter which one (i.e., the choice can be totally arbitrary). Thus, a suitable output based on the above dataset would be:
Individual
Label
ind_01
"ind-001"
ind_02
"ind-002"
ind_03
"label3"
ind_04
"ind-4"
...
...

You could use SAMPLE, which "returns an arbitrary value from the multiset passed to it".
SELECT ?individual (SAMPLE(?label) AS ?label_sample)
WHERE {
?individual rdfs:label ?label .
}
GROUP BY ?individual

SPARQL Restrict Number of Results for Specific Variable

Suppose I want to look for some first degree neighbors of Berlin. I ask the following query:
select ?s ?p where {
?s ?p dbr:Berlin.
}
Is it possible to put a restriction on the return result, such that there are at most 5 results for each unique value of ?p?

My attempts with subqueries all time out...
But, as potentially useful if not exactly perfect solution, maybe GROUP_CONCAT, MAX/MIN or SAMPLE are of use?
SELECT
?writer (GROUP_CONCAT(?namestring; SEPARATOR = " ") AS ?namestrings)
(MIN(?namestring) AS ?min_name)
(MAX(?namestring) AS ?max_name)
(SAMPLE(?namestring) AS ?random_name)
(SAMPLE(?namestring) AS ?another_random_name_that_may_unfortunately_be_the_same_again)
WHERE {
?writer wdt:P31 wd:Q5;
wdt:P166 wd:Q37922;
wdt:P735 ?firstname.
?firstname wdt:P1705 ?namestring.
}
GROUP BY ?writer
HAVING ((COUNT(?writer)) > 2 )
LIMIT 20
See it live here.
And, as you can see, SAMPLE is apparently evaluated only once, so using it repeatedly does not get you closer to five (different) samples.
(You can leave out the HAVING for your use. I only included it to restrict it to useful examples))

How to get UUID() from INSERT Sparql Request?

I would like to know if there is a possibility to retrieve UUID() urn: when using INSERT Statement in SPARQL query ?
My problem is simple but I don't know how to solve it using SPARQL :
I would like to store a lot of timestamp values. Same timestamp can appear multiple times, so I guess I can use UUID() to generate random URI.
I need urn: from UUID() function to relate my new triples.
I'm right ?
Or UUID() is not the solution ?
Thanks for your help.
EDIT :
Ok, so I have to say I would like to retrieve data in my python code.
I am using SPARQLWrapper to run my requests.
If I create one INSERT request like that :
INSERT {
?uuid1 rdf:type capt:ECHANTILLON .
?uuid1 capt:Nom ?uuid1 .
?uuid1 capt:Description '2019-08-07 16:07:34.027636' .
?uuid1 capt:Valeur '1565182189' .
} WHERE {
FILTER NOT EXISTS { ?uuid1 rdf:type capt:ECHANTILLON} .
BIND (UUID() AS ?uuid1) .
};
Then an other INSERT request using ?uuid1 from the first :
INSERT {
?uuid2 rdf:type capt:VALEUR .
?uuid2 capt:Nom ?uuid2 .
?uuid2 capt:Description '2019-08-07 16:07:34.027659' .
?uuid2 capt:Valeur '27.0' .
**?uuid1 capt:A_Pour_Valeur ?uuid2 .** <===
} WHERE {
FILTER NOT EXISTS { ?uuid2 rdf:type capt:VALEUR} .
BIND (UUID() AS ?uuid2) .
};
What I want is :
uuid1 = endpoint.setQuery(request_1)
request_2.addSubject(uuid1)
uuid2 = endpoint.setQuery(request_2)
Something like that.
How can I retrieve ?uuid1 from the first request if INSERT does not return this value ? I would like to make two requests if possible.
Have I to regroup two requests in one request, or have I to run a SELECT ?uuid1 request before running second ?

You can't get the value of the UUID directly from the SPARQL update - if you want to retrieve it via SPARQL somehow, you'll have to do a query after you've inserted it - or, of course, you could adapt your second SPARQL update to do the selection for you by querying for the 'correct' UUID in its WHERE clause.
However, in this case, that looks difficult to do, and I think the easiest solution for you is that you don't create the UUID in the SPARQL operation itself, but instead create it in code and then use that in your query string, e.g. by adding a VALUES clause, something like this:
import uuid
uuid1 = uuid.uuid1()
update = """INSERT { .... }
WHERE { VALUES ?uuid1 { <urn:uuid:""" + uuid1 + """> }
FILTER NOT EXISTS .... (etc) }"""

SPARQL property path, 1st part is unique, path repeats after

Assume we have the triples
a1 blah <http://foo.com/f>
a1 pt c1
c1 pt d1
How would I construct a SPARQL query to select a1 or c1 or d1? I need to be able to walk one blah predicate or multiple pt predicates.
So a1 can be part of a triple like : a1 blah <http://foo.com/f>
but I dont know how to specify the case where its not a1, but its an object connected to a1 through another path in the other direction. But I also need to retain the path using blah, since thats what a1 or f1 is connected to the object by
Here is how I got it to work, but its not pretty:
select * from
{
# other sparql stuff before it
{ ?s blah <http://foo.com/f>
?s pt+ <c1> }
UNION
{ <c1> blah <http://foo.com/f> }
}
I can change 'c1' to 'a1' or 'd1' and the above sparql still works

AKSW's comment was exactly right, you need to use a sequence, not an alternative. Based on your example, it sounds like you actually want subjects that are connected to a specific object by a chain of zero or one occurrences of "blah", and then zero or more occurrences of "pt".
If you want to do this for specific predicates p and q, that you can specify with IRI references, then it's just
?x p?/q* ?y
The * following the q means zero or more, and the ? following the p means zero or one.
However, if you don't know the specific IRIs in advance, e.g., you want to use variables, in something like:
?x (?p)?/(?q)* ?y
then you may be out of luck. SPARQL property paths don't support variables. You can get wildcards in property paths with something like s|!s, since every URI is either s or it isn't, but there's no provision for putting repeatable variables into property paths.

Why filter doesn't work in this context?

This is the query and the result:
As you see, I am filtering out the users that are bo:ania, so why do they still appear?
However, if I remove the widecard and select just the users ?user, bo:ania doesn't appear
I didn't provide a minimum data example because this is a question about how filter and wildcard work, not about a problem in extracting some data from a data set. However, if you need a minimum data, I'm more than happy to provide it.

?specificUser is bound to bo:ania by your VALUES statement. ?user is an entirely different binding defined by the other triple patterns. Your FILTER says to filter out results where ?user = bo:ania, and it appears to be doing that correctly, seeing that ?user is not bound to bo:ania in any of the results.
BTW, there isn't a need to use VALUES in this case unless you want to inspect multiple values. If it's just the one value, then the following would work, and not have you wondering why the binding to bo:ania is included in the result set:
SELECT *
WHERE {
?user a rs:user .
?user rs:hasRated ?rating .
?rating rs:hasRatingDate ?ratngDate .
FILTER (?ratingDates >= (now() -"P10000F"^^xsd:duration) )
FILTER (?user != bo:ania)
}

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Jena StmtIterator and database - sparql

Related

Retrieve only one label from set of individuals with multiple labels

SPARQL Restrict Number of Results for Specific Variable

How to get UUID() from INSERT Sparql Request?

SPARQL property path, 1st part is unique, path repeats after

Why filter doesn't work in this context?

Categories

Resources