Is it possible to programmatically add an OPTIONAL clause to a SPARQL query using the Jena ARQ API?
I would like to programmatically take this query:
select ?concept ?p ?o where {?s ?p ?o . } limit 10
To this:
SELECT ?concept ?p ?o ?test WHERE
{
?s ?p ?o
OPTIONAL { ?concept <http://www.test.com/test> ?test }
}
LIMIT 10
Through ARQ it's simple to add the additional result variable ?test:
Query q = QueryFactory.create(query)
query.addResultVar(var);
But from what I've found in the API docs and trawling across the net it's not possible to add an OPTIONAL clause. Do I need to use a different library?
Yes you can. See this introduction to the topic on the apache jena site.
Your starting point is getting the query pattern:
Element pattern = q.getQueryPattern();
That will be an ElementGroup if I remember correctly. Add the optional in there:
((ElementGroup) pattern).addElement(new ElementOptional(...));
The ... bit will be an ElementTriplesBlock, which is pretty straightforward.
Inelegant, however. In general I'd recommend using visitors and the algebra representation, but this direct route should work.
Related
Is there a way to match without case sensitivity in GraphDB?
The test dataset is pretty small. Around 8m triples.
SELECT ?s ?name
WHERE {
?s <http://www.sample.org.uk/data/schema/simplename/name> ?name.
?s <http://www.sample.org.uk/data/schema/provider> "nhle".
OPTIONAL {?s <http://www.sample.org.uk/data/schema/county/> "Essex"}
OPTIONAL {?s <http://www.sample.org.uk/data/schema/district/> "Epping Forest"}
OPTIONAL {?s <http://www.sample.org.uk/data/schema/parish/> "Buckhurst Hill"}
}
I can, of course, use FILTER - but it takes a good seven seconds to return which is too slow.
SELECT ?s ?name ?county ?district ?parish
WHERE {
?s <http://www.sample.org.uk/data/schema/simplename/name> ?name.
?s <http://www.sample.org.uk/data/schema/provider> "nhle".
OPTIONAL {?s <http://www.sample.org.uk/data/schema/county/> ?county}
OPTIONAL {?s <http://www.sample.org.uk/data/schema/district/> ?district}
OPTIONAL {?s <http://www.sample.org.uk/data/schema/parish/> ?parish}
FILTER (lcase(?county)='essex'
&& lcase(?district)='epping forest'
&& lcase(?parish)='buckhurst hill'
)
}
This may not be a direct answer to your question, sorry for that (I cannot make a comment).
Since you already know you want to match against the county of Essex, as apposed to anything with the label "essex" or "Essex", it might be better to use the URI for that county, instead of a label. The URI could for instance be:
<http://www.wikidata.org/entity/Q23240>
At least this will prevent you from accidentally matching something completely different with the label "Essex", p.e. the Essex whale ship (Wikipedia link)
Ofcourse I'm not aware of what your data looks like, so this may be of no use to you. Still worth pointing out hopefully.
whenever I start using SQL I tend to throw a couple of exploratory statements at the database in order to understand what is available, and what form the data takes.
e.g.
show tables
describe table
select * from table
Could anyone help me understand the way to complete a similar exploration of an RDF datastore using a SPARQL endpoint?
Well, the obvious first start is to look at the classes and properties present in the data.
Here is how to see what classes are being used:
SELECT DISTINCT ?class
WHERE {
?s a ?class .
}
LIMIT 25
OFFSET 0
(LIMIT and OFFSET are there for paging. It is worth getting used to these especially if you are sending your query over the Internet. I'll omit them in the other examples.)
a is a special SPARQL (and Notation3/Turtle) syntax to represent the rdf:type predicate - this links individual instances to owl:Class/rdfs:Class types (roughly equivalent to tables in SQL RDBMSes).
Secondly, you want to look at the properties. You can do this either by using the classes you've searched for or just looking for properties. Let's just get all the properties out of the store:
SELECT DISTINCT ?property
WHERE {
?s ?property ?o .
}
This will get all the properties, which you probably aren't interested in. This is equivalent to a list of all the row columns in SQL, but without any grouping by the table.
More useful is to see what properties are being used by instances that declare a particular class:
SELECT DISTINCT ?property
WHERE {
?s a <http://xmlns.com/foaf/0.1/Person>;
?property ?o .
}
This will get you back the properties used on any instances that satisfy the first triple - namely, that have the rdf:type of http://xmlns.com/foaf/0.1/Person.
Remember, because a rdf:Resource can have multiple rdf:type properties - classes if you will - and because RDF's data model is additive, you don't have a diamond problem. The type is just another property - it's just a useful social agreement to say that some things are persons or dogs or genes or football teams. It doesn't mean that the data store is going to contain properties usually associated with that type. The type doesn't guarantee anything in terms of what properties a resource might have.
You need to familiarise yourself with the data model and the use of SPARQL's UNION and OPTIONAL syntax. The rough mapping of rdf:type to SQL tables is just that - rough.
You might want to know what kind of entity the property is pointing to. Firstly, you probably want to know about datatype properties - equivalent to literals or primitives. You know, strings, integers, etc. RDF defines these literals as all inheriting from string. We can filter out just those properties that are literals using the SPARQL filter method isLiteral:
SELECT DISTINCT ?property
WHERE {
?s a <http://xmlns.com/foaf/0.1/Person>;
?property ?o .
FILTER isLiteral(?o)
}
We are here only going to get properties that have as their object a literal - a string, date-time, boolean, or one of the other XSD datatypes.
But what about the non-literal objects? Consider this very simple pseudo-Java class definition as an analogy:
public class Person {
int age;
Person marriedTo;
}
Using the above query, we would get back the literal that would represent age if the age property is bound. But marriedTo isn't a primitive (i.e. a literal in RDF terms) - it's a reference to another object - in RDF/OWL terminology, that's an object property. But we don't know what sort of objects are being referred to by those properties (predicates). This query will get you back properties with the accompanying types (the classes of which ?o values are members of).
SELECT DISTINCT ?property, ?class
WHERE {
?s a <http://xmlns.com/foaf/0.1/Person>;
?property ?o .
?o a ?class .
FILTER(!isLiteral(?o))
}
That should be enough to orient yourself in a particular dataset. Of course, I'd also recommend that you just pull out some individual resources and inspect them. You can do that using the DESCRIBE query:
DESCRIBE <http://example.org/resource>
There are some SPARQL tools - SNORQL, for instance - that let you do this in a browser. The SNORQL instance I've linked to has a sample query for exploring the possible named graphs, which I haven't covered here.
If you are unfamiliar with SPARQL, honestly, the best resource if you get stuck is the specification. It's a W3C spec but a pretty good one (they built a decent test suite so you can actually see whether implementations have done it properly or not) and if you can get over the complicated language, it is pretty helpful.
I find the following set of exploratory queries useful:
Seeing the classes:
select distinct ?type ?label
where {
?s a ?type .
OPTIONAL { ?type rdfs:label ?label }
}
Seeing the properties:
select distinct ?objprop ?label
where {
?objprop a owl:ObjectProperty .
OPTIONAL { ?objprop rdfs:label ?label }
}
Seeing the data properties:
select distinct ?dataprop ?label
where {
?dataprop a owl:DatatypeProperty .
OPTIONAL { ?dataprop rdfs:label ?label }
}
Seeing which properties are actually used:
select distinct ?p ?label
where {
?s ?p ?o .
OPTIONAL { ?p rdfs:label ?label }
}
Seeing what entities are asserted:
select distinct ?entity ?elabel ?type ?tlabel
where {
?entity a ?type .
OPTIONAL { ?entity rdfs:label ?elabel } .
OPTIONAL { ?type rdfs:label ?tlabel }
}
Seeing the distinct graphs in use:
select distinct ?g where {
graph ?g {
?s ?p ?o
}
}
SELECT DISTINCT * WHERE {
?s ?p ?o
}
LIMIT 10
I often refer to this list of queries from the voiD project. They are mainly of a statistical nature, but not only. It shouldn't be hard to remove the COUNTs from some statements to get the actual values.
Especially with large datasets, it is important to distinguish the pattern from the noise and to understand which structures are used a lot and which are rare. Instead of SELECT DISTINCT, I use aggregation queries to count the major classes, predicates etc. For example, here's how to see the most important predicates in your dataset:
SELECT ?pred (COUNT(*) as ?triples)
WHERE {
?s ?pred ?o .
}
GROUP BY ?pred
ORDER BY DESC(?triples)
LIMIT 100
I usually start by listing the graphs in a repository and their sizes, then look at classes (again with counts) in the graph(s) of interest, then the predicates of the class(es) I am interested in, etc.
Of course these selectors can be combined and restricted if appropriate. To see what predicates are defined for instances of type foaf:Person, and break this down by graph, you could use this:
SELECT ?g ?pred (COUNT(*) as ?triples)
WHERE {
GRAPH ?g {
?s a foaf:Person .
?s ?pred ?o .
}
GROUP BY ?g ?pred
ORDER BY ?g DESC(?triples)
This will list each graph with the predicates in it, in descending order of frequency.
I am trying to get all the triples <subject, predicate, object> which contain the P31 property as predicate from Wikidata using their SPARQL interface. I think the query should be something like this.
SELECT ?s ?p ?o WHERE {
{
?s ?p ?o.
FILTER (?p=P31)
}
}
where, P31, is the property I want.
Using SPARQL, your query could look like this
SELECT ?subject ?object WHERE {
?subject wdt:P31 ?object
}
but it will most certainly hit a timeout given that either P31 or P279 are expected to be set on each of the (currently) 35 millions entities. You can try it with a limit though: via the GUI or as JSON.
If you really need to get a list of all the triples with a P31 property, the only possible way I'm aware of is to use the Wikidata dumps, and eventually use grep or wikidata-filter to get a subset of it.
Try this,
SELECT ?s ?p ?o WHERE {
{
?s ?p ?o.
FILTER(?p = wdt:P31)
}
}
LIMIT 20
Refer to this link
I have installed a local virtuoso server and imported dbpedia data into it. I found a range of SPARQL commands that don't work in isql. For example I ran this query in my isql shell.
SPARQL SELECT ?s GROUP_CONCAT (?obj, ' ') as ?artist_list WHERE { ?s a dbpedia-owl:Single ;(dbpedia-owl:artist|dbpedia-owl:producer) ?obj } limit 10
It first complains about the | in (dbpedia-owl:artist|dbpedia-owl:producer), then for the GROUP_CONCAT.
I did some research into Virtuoso docs and did the following
EDIT1
I tried to check different cases,
1- group_concat
SPARQL select ?s (group_concat(?obj; separator='|') as ?artist_list) FROM <http://ja.dbpedia.org> where { ?s a dbpedia-owl:Single ; (dbpedia-owl:artist) ?obj } group by ?s limit 10;
SQL> syntax error at 'group_concat' before '('
2- using values
SPARQL select ?s FROM <http://ja.dbpedia.org> where { values ?sType {dbpedia-owl:Song dbpedia-owl:Single }. ?s a ?sType} limit 10;
*** Error 37000: [Virtuoso Driver][Virtuoso Server]SQ074: Line 1: SP030: SPARQL compiler, line 1: syntax error at 'values' before '?sType'
3- using |
SPARQL select * FROM <http://ja.dbpedia.org> where { ?s (dbpedia-owl:artist|dbpedia-owl:producer) ?o } limit 10
*** Error 37000: [Virtuoso Driver][Virtuoso Server]SQ074: Line 1: SP030: SPARQL compiler, line 0: Invalid character in SPARQL expression at '|'
What am I doing wrong? All above SPARQL queries work in Standard SPARQL endpoints
I don't think that Virtuoso supports all of SPARQL 1.1, but your query isn't legal SPARQL either. (Virtuoso does accept some stuff that's not quite SPARQL, but your query is within the realm of SPARQL.) I suggest you take a look at sparql.org's query validator. You'll find at least the following issues:
You didn't define the dbpedia-owl: prefix. (You might have that in the endpoint though, so it might not be a problem.)
It needs to be (group_concat(…) as …) with parenthesis. (The case doesn't matter, though, so GROUP_CONCAT is fine. I think Virtuoso will accept it without the parens, but that doesn't make it legal SPARQL.)
In group_concat, you need to use a semicolon and write separator.
If you're using aggregates like group_concat, then you're grouping by some variables, and you can't select a variable that you're not grouping by. That means that you need to group by ?s.
Once you fix those things, you'd end up with something like:
prefix dbpedia-owl: <http://dbpedia.org/ontology/>
select ?s (group_concat(?obj; separator=' ') as ?artist_list)
where {
?s a dbpedia-owl:Single ;
(dbpedia-owl:artist|dbpedia-owl:producer) ?obj
}
group by ?s
limit 10
That's legal SPARQL 1.1. If Virtuoso still complains about that, you could use a values block instead of the property path, and so have the following, which is equivalent:
prefix dbpedia-owl: <http://dbpedia.org/ontology/>
select ?s (group_concat(?obj; separator=' ') as ?artist_list)
where {
values ?p { dbpedia-owl:artist dbpedia-owl:producer }
?s a dbpedia-owl:Single ;
?p ?obj
}
group by ?s
limit 10
If you still get complaints about values (but DBpedia's endpoint supports values, and it's Virtuoso; how old a version are you installing?), you can use union:
prefix dbpedia-owl: <http://dbpedia.org/ontology/>
select ?s (group_concat(?obj; separator=' ') as ?artist_list)
where {
?s a dbpedia-owl:Single .
{ ?s dbpedia-owl:artist ?obj } UNION
{ ?s dbpedia-owl:producer ?obj }
}
group by ?s
limit 10
If you want some SPARQL 1.1 support, you'll need to use at least Virtuoso 7.0.0. The release notes include:
2013-08-05 -- Virtuoso Open-Source Edition 7.0.0 Released
Added support for SPARQL 1.1 BIND and VALUES clauses
Added support for SPARQL 1.1 Functions and Aggregates
Added support for ?graph parameter in SPARQL 1.1 Graph Protocol
Fixed issues with Transitivity, Inference, and SPARQL 1.1 Property Paths
I want to query a triple store which is multilingual.
Query that works:
SELECT * WHERE {?s ?p "sdfsdf"#en}
I want "sdfsdf" to be an attribute in general like ?o#en.
How should i query then?
Filter by the language of the object:
select * where { ?s ?p ?o . filter (lang(?o) = "en") }
Note that your results will be of the form "sdfsdf"#en, rather than just the lexical form "sdfsdf". (You can do that additional work in SPARQL 1.1, and processors like ARQ using extensions)