SPARQL Distinct pairs - sparql

I've got a table, where there are documents with identical authors. I need to get the distinct pairs of documents. I did the following:
SELECT DISTINCT ?d1 ?d2 WHERE {
?d1 myns:creator ?x.
?d2 myns:creator ?y.
FILTER (?x=?y && ?d1!=?d2).
}
GROUP BY ?d1 ?d2
But for this both DOC1, DOC2 and DOC2, DOC1 are in the result. I need to get rid of one of the pairs.
Here is the whole triples database:
#prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
#prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
#prefix myns: <http://my.local.namespace#> .
_:doc1 rdf:type myns:Document.
_:doc1 myns:creator _:Pete.
_:doc1 myns:year "2000"^^xsd:integer.
_:doc1 myns:publisher _:p1.
_:doc2 rdf:type myns:Document.
_:doc2 myns:creator _:John.
_:doc2 myns:year "2004"^^xsd:integer.
_:doc2 myns:publisher _:p2.
_:doc3 rdf:type myns:Document.
_:doc3 myns:creator _:Pete.
_:doc3 myns:publisher _:p3.
_:doc4 rdf:type myns:Document.
_:doc4 myns:creator _:Bob.
_:doc4 myns:year "2010"^^xsd:integer.
_:doc4 myns:publisher _:p2.
_:Pete rdf:type myns:Person.
_:Pete myns:knows _:Bob.
_:Pete myns:knows _:John .
_:John rdf:type myns:Person.
_:John myns:age "29"^^xsd:integer.
_:John myns:knows _:Bob.
_:Bob rdf:type myns:Person.
_:Bob myns:age "35"^^xsd:integer.
The result, that I am getting, after executing query is:
D1 D2
_:891f1e98-b411-4e54-9533-18d530f09c6ddoc1 _:891f1e98-b411-4e54-9533-18d530f09c6ddoc3
_:891f1e98-b411-4e54-9533-18d530f09c6ddoc3 _:891f1e98-b411-4e54-9533-18d530f09c6ddoc1
As it is noticeable, technically both pairs are same. I junst need distinct one (i.e. one of them is enough). I am not sure about enviromental characteristics. But there is Sesame framework

This will work in some systems:
SELECT ?d1 ?d2 WHERE {
?d1 myns:creator ?x.
?d2 myns:creator ?y.
FILTER (?x=?y && STR(IRI(?d1)) < STR(IRI(?d2))).
}
?d1 and ?d2 are going to be blank nodes. But blank nodes are blank.
So to provide the ordering for <, we need some kind of query-wide label or value associated with each one.
Your data does not have any distinguishing triples for each person.It would be better to put real names in the data:
_:Pete rdfs:label "Pete" .
Even better, use the FOAF vocabulary.
Some systems allow blank nodes in IRI() - technically it's an extension of the SPARQL specification. You can then take the STR form and compare. that works on your data for me (Apache Jena) - You don't say which RDF system you are using.
The best solution is put distinguishing information into the data.

You can do this with a little trick: turn the != into a < (or >), and convert the values to strings, so you can do lexical comparisons:
SELECT DISTINCT ?d1 ?d2 WHERE {
?d1 myns:creator ?x.
?d2 myns:creator ?y.
FILTER (?x=?y && STR(?d1) < STR(?d2)).
}
GROUP BY ?d1 ?d2
This works on the idea that for any pair of identifiers that are not equal, one identifier is always greater than the other (by lexical ordering). So of any two pairs, only one will actually be selected.
Update so now that you've shown your data, we can see that the problem is that you are not using IRIs to distinguish your documents, but use blank nodes. The above query does not work because according to the SPARQL standard, blank nodes are unordered (so directly comparing via < does not work), and moreover, the STR function is defined to only operate on literals or IRIs, not on blank nodes.
The best solution is to change your data, and make sure that you use proper IRIs, because regardless of whether you can somehow make this query work on this data, the result would be almost useless: blank nodes have no meaning outside their local scope, so the document identifiers that your query returns can not really be reused; for example, you would not be able to do a SPARQL query that gets any properties specifically for the document _:doc1 (although to be fair Sesame has a workaround for this in the API).
A very simple way to change your blank nodes to IRIs, by the way, is to replace all occurrences of _: in your turtle file with myns:.

Related

Sparql query to read from all named graphs without knowing the names

I am looking to run a SPARQL query over any dataset. We dont know the names of the named graphs in the datasets.
These are lots of documentation and examples of selection from named graphs when you know the name of the named graph/s. There are examples showing listing named graphs.
We are running the Jena from Java so it would be possible to run 2 queries, the first gets the named graphs and we inject these into the 2nd.
But surely you can write a single query that reads from all named graphs when you dont know their names?
Note: we are looking to stay away from using default graph/s as their behaviour seems implementation dependent.
Example:
{
?s foaf:name ?name ;
vCard:nickname ?nickName .
}
If you want the pattern to match within one graph and wish to try each graph, use the GRAPH ?g form.
GRAPH ?g
{ ?s foaf:name ?name ;
vc:nickname ?nickName .
}
If you want to make a query where the pattern matches across named graphs, -- e.g. foaf:name in one graph and vCard:nickname in another, same subject --
then set union default graph tdb2:unionDefaultGraph true then the default graph as seen by the query is the union (actually, RDF merge - no duplicates) of all the named graphs. Use the pattern as originally given.
Fuseki configuration file extract:
:dataset_tdb2 rdf:type tdb2:DatasetTDB2 ;
tdb2:location "DB2" ;
## Optional - with union default for query and update WHERE matching.
tdb2:unionDefaultGraph true ;
.
In code, not Fuseki, the application can use Dataset.getUnionModel().

How to Read Specific Range value of an Object Property

i'm new dealing with Ontologies and finding problems to get my SPARQL Query working , trying to read value of specific Object property that has multiple Ranges Object Property Screenshot
trying this Query Return all Object Properties Execution Result , Protege Visualization
PREFIX ns: <http://www.semanticweb.org/pavilion/ontologies/2017/5/untitled-ontology-66#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT *
WHERE {
ns:star1086 ns:possesses ?z .
}
The Desired Result that i want to read only the desired Range value, Thanks in advance.
I think there is something wrong about your ontology.
Specifying muliple ranges to your predicate creates an intersection. Take the following statement:
?star ns:possesses ?something
Then ?something is a SpectralType and a StarTemperature and a StarCoordinates and a StarName at the same time, which is not what you want.
Instead, you should use unions. Using unions, you can state that the object of a ns:possesses statement can be either a SpectralType or a StarTemperature or a StarCoordinates or a StarName. Then, in your SPARQL query you can write the following to get only statements from a single type.
SELECT * WHERE {
ns:star1086 ns:possesses ?z .
?z a ns:SpectralType .
}
In Protégé, to write a union, open the class expression editor (by clicking on the "plus" next to "Ranges" for instance) and separate the different members with or :
SpectralType or StarTemperature or StarCoordinates or StarName
And click "OK" to create the new range.
Further considerations
Let's take a step a back and look at your ontology.
You should not use a single predicate to store all these information in the first place. Instead, I suggest you use different sub-predicates so that your graphs and queries hold more semantic value.
Furthermore, StarName and Temperature are literal values. You should not use classes for that. Use datatype properties instead.
Here is a Gist you can download and open in Protégé. It contains some sample data so you can try the following SPARQL queries.
PREFIX : <http://www.richarddegenne.com/ontology/astronomy#>
# Get all statements about :star1086
SELECT * WHERE {
:star1086 ?predicate ?object
}
# Get some statement about :star1086
SELECT * WHERE {
VALUES ?predicate {
:hasSpectralType :temperate
}
:star1086 ?predicate ?object
}
# Ask whether a given pattern is true
ASK WHERE {
:star1086 :hasSpectralType :yellowDwarf
}
# Filter stars based on their temperature
# Note: You might want to add more stars with different temperature
# if you want useful results.
SELECT ?star WHERE {
?star :temperature ?temperature
FILTER(?temperature > 5000)
}

How to execute SPARQL Query (Call a service) Over extracted subgraph?

I have a RDF graph with several types of relations (relations with the same prefix and with different prefixes also). I need to call a service over the graph but filtering out some relations.
Example:
#prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
#prefix myPref: <http://www.myPref.com/>.
#prefix otherPref: <http://www.otherPref.com/>.
myPref:1
myPref:label "1" ;
myPref:solid myPref:2 ;
myPref:dotted myPref:4 ;
otherPref:dashed myPref:3 ;
otherPref:dashed2 myPref:3 .
myPref:2
myPref:label "2" ;
myPref:solid myPref:3 .
myPref:3
myPref:label "3" .
myPref:4
myPref:label "4" ;
myPref:dotted myPref:3 .
I would like to run the service call over an extracted sub-graph containing only the solid and dotted relations (In this particular case, running a service calculating the shortest path between 1 to 3, I want to exclude those direct links).
I run the service (Over the entire graph) like this:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
PREFIX myPref: <http://www.myPref.com/>.
PREFIX otherPref: <http://www.otherPref.com/>.
PREFIX gas: <http://www.bigdata.com/rdf/gas#>
SELECT ?sp ?out {
SERVICE gas:service {
gas:program gas:gasClass "com.bigdata.rdf.graph.analytics.SSSP" .
gas:program gas:in myPref:1 .
gas:program gas:target myPref:3 .
gas:program gas:out ?out .
gas:program gas:out1 ?sp .
}
}
How can I extract a subgraph containing only the links I want (Dotted and solid) and the run the service call over the extracted sub-graph?
SPARQL doesn't provide any functionality for querying a constructed graph, unfortunately. I've come across places where it would make some queries very easy. Some endpoints do have extensions to support it, though. I think that dotNetRDF might support it. There are probably a few aspects: in many cases, it's not actually necessary; if the endpoint supports updates, you can create a new named graph and construct into it, and then launch a second query against it (which is pretty much what you're asking for, but in two steps); this could be a very expensive operation, so endpoints might disable it anyway, even if it was directly supported.
The first note, though, that it's often times not necessary, appears that it might be the case here.
I need to call a service over the graph but filtering out some relations.
In this case, you can query over the subgraph that you want, I think, by using property paths. You can ask for paths built from just solid and dashed edges like:
?s myPref:solid|myPref:dotted ?t
If you want an arbitrary path of them, you can repeat it:
?s (myPref:solid|myPref:dotted)+ ?t
If you have unique paths between sources and destinations, then you can figure out the lengths of paths using the standard "count the ways of splitting the path" technique:
select (count(?t) as ?length) {
?s (myPref:solid|myPref:dotted)* ?t
?t (myPref:solid|myPref:dotted)* ?u
}
group by ?s ?t

Querying WikiData, difference between p and wdt default prefix

I am new to wikidata and I can't figure out when I should use -->
wdt prefix (http://www.wikidata.org/prop/direct/)
and when I should use -->
p prefix (http://www.wikidata.org/prop/).
in my sparql queries. Can someone explain what each of these mean and what is the difference?
Things in the p: namespace are used to select statements. Things in the wdt: namespace are used to select entites. Entity selection, with wdt:, allows you to simplify or summarize more complex queries involving statement selection.
When you see a p: you are usually going to see a ps: or pq: shortly following. This is because you rarely want a list of statements; you usually want to know something about those statements.
This example is a two-step process showing you all the graffiti in Wikidata:
SELECT ?graffiti ?graffitiLabel
WHERE
{
?graffiti p:P31 ?statement . # entities that are statements
?statement ps:P31 wd:Q17514 . # which state something is graffiti
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
Two different versions of the P31 property are used here, housed in different namespaces. Each version comes with different expectations about how it will connect to other items. Things in the p: namespace connect entities to statements, and things in the ps: namespace connect statements to values. In the example, p:P31 is used to select statements about an entity. The entity will be graffiti, but we do not specify that until the next line, where ps:P31 is used to select the values (subjects) of the statements, specifying that those values should be graffiti.
So, that's kind of complicated! The wdt: namespace is supposed to make this kind of query simper. The example could be rewritten as:
SELECT ?graffiti ?graffitiLabel
WHERE
{
?graffiti wdt:P31 wd:Q17514 . # entities that are graffiti
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
This is now one line shorter because we are no longer looking for statements about graffiti, but for graffiti itself. The dual p: and ps: linkages are summarized with a wdt: version of the same P31 property. However, be aware:
This technique only works for statements that are true or false in nature, like, is a thing graffiti or not. (The "t" in wdt: stands for "truthy").
Information available to wdt: is just missing some facts, sometimes. Often in my experience a p: and ps: query will return a few more results than a wdt: query.
If you go to the Wikidata item page for Barack Obama at https://www.wikidata.org/wiki/Q76 and scroll down, you see the entry for the "spouse" property P26:
Think of the p: prefix as a way to get to the entire white box on the right side of the image.
In order to get to the information inside the white box, you need to dig deeper.
In order to get to the main part of the information ("Michelle Obama"), you combine the p: prefix with the ps: prefix like this:
SELECT ?spouse WHERE {
wd:Q76 p:P26 ?s .
?s ps:P26 ?spouse .
}
The variable ?s is an abstract statement node (aka the white box).
You can get the same information with only one triple in the body of the query by using wdt::
SELECT ?spouse WHERE {
wd:Q76 wdt:P26 ?spouse .
}
So why would you ever use p:?
You might have noticed that the white box also contains meta information ("start time" and "place of marriage").
In order to get to the meta information, you combine the p: prefix with the pq: prefix.
The following example query returns all the information together with the statement node:
SELECT ?s ?spouse ?time ?place WHERE {
wd:Q76 p:P26 ?s .
?s ps:P26 ?spouse .
?s pq:P580 ?time .
?s pq:P2842 ?place .
}
They're simply XML namespace prefixes, basically a shortcut for full URIs. So given wdt:Apples, the full URI is http://www.wikidata.org/prop/direct/Apples and given p:fruitType the URI is http://www.wikidata.org/prop/fruitType.
Prefixes/namespaces have no other meaning, they are simply ways to define the name of something with URL format. However conventions, such as defining properties in http://www.wikidata.org/prop/, are useful to separate the meanings of terms, so 'direct' is likely a sub-type of property as well (in this case having to do with wikipedia dumps).
For the specifics, you'd need to hope the authors have exposed some naming convention, or be caught in a loop of "was it p:P51 or p:P15 or maybe wdt:P51?". And may luck be with you because the "semantics" of semantic technology have been lost.

Inquiry on example of explicit join in the SPARQL

I have following sparql query(from the book, semantic web primer):
select ?n
where
{
?x rdf:type uni:Course;
uni:isTaughtBy :949352
?c uni:name ?n .
FILTER(?c=?x) .
}
In this case, I guess this code is same as the the following:
Select ?n
Where
{
?x rdf:type uni:course;
uni:isTaughtBy :949352 .
?x uni:name ?n .
}
Does this query lead to coding error?
No, I don't see why it should give you an error or produce wrong results. Just make sure to always use the right case (uni:Course vs. uni:course), as SPARQL is case sensitive.
To be honest, the first version seems rather obscure as it uses a FILTER without a real need for it. That said, you may further slim down your query if you wish:
SELECT ?n
WHERE
{
?x rdf:type uni:Course;
uni:isTaughtBy :949352;
uni:name ?n .
}
However, keep in mind that saving characters does not always lead to improved readability.
For your example yes the queries are identical and there would be no value in using a FILTER over a join.
However the reason why you might use the FILTER form is the difference in semantics between joins and the = operator
Joins require that the values of the variables be exactly the same RDF term, whereas = does value equality - do the values of RDF terms that represent the same value? This is primarily a concern when one/both of the values may have literal values
It's easier to see if you take a specific example, assume ?x=4 and ?c = 4.0 (which is a bad example for your query but illustrates the point)
?x = ?c would give true while a join would give no results because they are not the exact same term