Inquiry on example of explicit join in the SPARQL - sparql

I have following sparql query(from the book, semantic web primer):
select ?n
where
{
?x rdf:type uni:Course;
uni:isTaughtBy :949352
?c uni:name ?n .
FILTER(?c=?x) .
}
In this case, I guess this code is same as the the following:
Select ?n
Where
{
?x rdf:type uni:course;
uni:isTaughtBy :949352 .
?x uni:name ?n .
}
Does this query lead to coding error?

No, I don't see why it should give you an error or produce wrong results. Just make sure to always use the right case (uni:Course vs. uni:course), as SPARQL is case sensitive.
To be honest, the first version seems rather obscure as it uses a FILTER without a real need for it. That said, you may further slim down your query if you wish:
SELECT ?n
WHERE
{
?x rdf:type uni:Course;
uni:isTaughtBy :949352;
uni:name ?n .
}
However, keep in mind that saving characters does not always lead to improved readability.

For your example yes the queries are identical and there would be no value in using a FILTER over a join.
However the reason why you might use the FILTER form is the difference in semantics between joins and the = operator
Joins require that the values of the variables be exactly the same RDF term, whereas = does value equality - do the values of RDF terms that represent the same value? This is primarily a concern when one/both of the values may have literal values
It's easier to see if you take a specific example, assume ?x=4 and ?c = 4.0 (which is a bad example for your query but illustrates the point)
?x = ?c would give true while a join would give no results because they are not the exact same term

Related

DBPedia SPARQL, return certain number of relevant page URIs for entity EXCEPT the URIs where the entity belongs to a set of subclasses of Owl:Thing

Looking for SPARQL query to do the following:
For example, I have the word Apple. Apple may refer to the organization Apple_Inc or the Species of Plants class as per the ontology. Owl: Thing has a subclass called Species, so I want to return those most relevant/maximum-hit URIs where the keyword Apple does not belong to the Species subclass. So when you return all the URIs, http://dbpedia.org/page/Apple should not be one of them, neither must ANY relevant link that comes under Species subclass.
By maximum-hit/most relevant I mean the top returned results that match the query! Like when you access the PrefixSearch (i.e. Autocomplete) API, it has the parameter called MaxHits.
For example http://lookup.dbpedia.org/api/search/PrefixSearch?QueryClass=&MaxHits=2&QueryString=berl is a link where you want to return the top 2 URIs that match the QueryString=berl.
Like I'm actually really struggling to even explain the work I've done so far because I'm not able to understand the structure and how to formulate a proper query..
with respect to negation in SPARQL, I found a relevant portion of the documentation in the link here.. But I do not know how and where to proceed from there, and cannot understand why keywords like ?person are used.. I can understand the person is used to selected well.. PEOPLE names, but I would like to know how and where to find these keywords like ?person, ?name to represent a specific entity..
SELECT ?uri ?label
WHERE {
?uri rdfs:label ?label .
filter(?label="car"#en)
}
I would really appreciate if someone could link me the part of the documentation I can clearly read and understand that ?uri is used to select a URI in the form www.dbpedia.org'/page/SomeEntity and what these ?person, ?name, ?label represent.
I'm actually so lost.. I will go up and start eating one elephant at a time. For now, I'll be very grateful if I get an answer to this.
If there is anyway you know where I can avoid learning and using SPARQL, that would work too! I know Python well enough, so leveraging an API to pull this information is also fine by me. This question was posted by me.
Answer posted by #Stanislav-Kravin --
SELECT DISTINCT ?s
WHERE
{ ?s a owl:Thing .
?s rdfs:label ?label .
FILTER ( LANGMATCHES ( LANG ( ?label ), 'en' ) )
?label bif:contains '"apple"' .
FILTER NOT EXISTS { ?s rdf:type/rdfs:subClassOf* dbo:Species }
}

Querying DBpedia-Live with SPARQL does not give same answer as DBpedia

I want to query DBpedia with DBpedia Live endpoint.
I have this query :
SELECT *
WHERE {
?x a dbo:Person .
?x rdfs:label "Usain Bolt"#en .
}
This query gives the correct answer with most names I tried (for example “Teddy Riner"#en) but it fails with Usain Bolt and Rachid Badouri.
I don’t get why as their DBpedia pages (Teddy Riner, Usain Bolt) are constructed the same way: they both have a rdfs:label, which is written exactly like I did.
It seems to me that there is an incoherence between the endpoint and DBpedia. But I don’t think that it's because the endpoint is not to date.
Even more surprising, this query gives the correct answer:
SELECT *
WHERE {
?x rdfs:label "Usain Bolt"#en .
}
However, Usain Bolt is a dbo:Person! Same thing for Rachid Badouri.
Could someone explain me why the first query does not give answer?
Any help would be appreciated! Thanks
According to DBpedia-Live, at the time of writing, the entity with rdfs:label "Usain Bolt"#en has many types, but is not a dbo:Person. Similar for the entity with rdfs:label "Rachid Badouri"#en.
In contrast, the entity with rdfs:label "Teddy Riner"#en is a dbo:Person.
Note: DBpedia-Live content is a moving target, varying with Wikipedia content changes, adjustments in the templates, and other variables. The statements I made above may no longer be true when you read this.

Querying WikiData, difference between p and wdt default prefix

I am new to wikidata and I can't figure out when I should use -->
wdt prefix (http://www.wikidata.org/prop/direct/)
and when I should use -->
p prefix (http://www.wikidata.org/prop/).
in my sparql queries. Can someone explain what each of these mean and what is the difference?
Things in the p: namespace are used to select statements. Things in the wdt: namespace are used to select entites. Entity selection, with wdt:, allows you to simplify or summarize more complex queries involving statement selection.
When you see a p: you are usually going to see a ps: or pq: shortly following. This is because you rarely want a list of statements; you usually want to know something about those statements.
This example is a two-step process showing you all the graffiti in Wikidata:
SELECT ?graffiti ?graffitiLabel
WHERE
{
?graffiti p:P31 ?statement . # entities that are statements
?statement ps:P31 wd:Q17514 . # which state something is graffiti
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
Two different versions of the P31 property are used here, housed in different namespaces. Each version comes with different expectations about how it will connect to other items. Things in the p: namespace connect entities to statements, and things in the ps: namespace connect statements to values. In the example, p:P31 is used to select statements about an entity. The entity will be graffiti, but we do not specify that until the next line, where ps:P31 is used to select the values (subjects) of the statements, specifying that those values should be graffiti.
So, that's kind of complicated! The wdt: namespace is supposed to make this kind of query simper. The example could be rewritten as:
SELECT ?graffiti ?graffitiLabel
WHERE
{
?graffiti wdt:P31 wd:Q17514 . # entities that are graffiti
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
This is now one line shorter because we are no longer looking for statements about graffiti, but for graffiti itself. The dual p: and ps: linkages are summarized with a wdt: version of the same P31 property. However, be aware:
This technique only works for statements that are true or false in nature, like, is a thing graffiti or not. (The "t" in wdt: stands for "truthy").
Information available to wdt: is just missing some facts, sometimes. Often in my experience a p: and ps: query will return a few more results than a wdt: query.
If you go to the Wikidata item page for Barack Obama at https://www.wikidata.org/wiki/Q76 and scroll down, you see the entry for the "spouse" property P26:
Think of the p: prefix as a way to get to the entire white box on the right side of the image.
In order to get to the information inside the white box, you need to dig deeper.
In order to get to the main part of the information ("Michelle Obama"), you combine the p: prefix with the ps: prefix like this:
SELECT ?spouse WHERE {
wd:Q76 p:P26 ?s .
?s ps:P26 ?spouse .
}
The variable ?s is an abstract statement node (aka the white box).
You can get the same information with only one triple in the body of the query by using wdt::
SELECT ?spouse WHERE {
wd:Q76 wdt:P26 ?spouse .
}
So why would you ever use p:?
You might have noticed that the white box also contains meta information ("start time" and "place of marriage").
In order to get to the meta information, you combine the p: prefix with the pq: prefix.
The following example query returns all the information together with the statement node:
SELECT ?s ?spouse ?time ?place WHERE {
wd:Q76 p:P26 ?s .
?s ps:P26 ?spouse .
?s pq:P580 ?time .
?s pq:P2842 ?place .
}
They're simply XML namespace prefixes, basically a shortcut for full URIs. So given wdt:Apples, the full URI is http://www.wikidata.org/prop/direct/Apples and given p:fruitType the URI is http://www.wikidata.org/prop/fruitType.
Prefixes/namespaces have no other meaning, they are simply ways to define the name of something with URL format. However conventions, such as defining properties in http://www.wikidata.org/prop/, are useful to separate the meanings of terms, so 'direct' is likely a sub-type of property as well (in this case having to do with wikipedia dumps).
For the specifics, you'd need to hope the authors have exposed some naming convention, or be caught in a loop of "was it p:P51 or p:P15 or maybe wdt:P51?". And may luck be with you because the "semantics" of semantic technology have been lost.

sparql empty result for dbpedia-owl:influenced property

I am trying to retrieve the value of the dbpedia-owl:influenced in this page e.g: Andy_Warhol
The query I write is:
PREFIX rsc : http://dbpedia.org/resource
PREFIX dbpedia-owl :http://dbpedia.org/ontology
SELECT ?o WHERE {
rsc:Andy_Warhol dbpedia-owl:infuenced ?o .
}
but it is EMPTY.
Strange is that when I have the same query for another property from the ontology type like "birthPlace", the sparql engine gives the result back:
SELECT ?o WHERE {
rsc:Andy_Warhol dbpedia-owl:birthplace ?o .
}
which is a link to another resource:
dbpedia.org/resource/Pittsburgh
I am just confused how to write this query?
besides several formal errors addressed in the answer of #Joshua, there is also the semantic problem that the properties you are looking for - in this case - seem to be found on the entities that were influenced.
this query might give you the desired results
PREFIX rsc: <http://dbpedia.org/resource/>
PREFIX dbpedia-owl: <http://dbpedia.org/ontology/>
SELECT ?s WHERE {
?s dbpedia-owl:influencedBy rsc:Andy_Warhol .
}
run query
There are a few issues here. One is that the SPARQL, as presented, isn't correct. I edited to make the prefix syntax legal, but the prefixes were still wrong (they didn't end with a final slash). You don't want to be querying for http://dbpedia.org/resourceAndy_Warhol after all; you want to query for http://dbpedia.org/resource/Andy_Warhol. Some standard namespaces for DBpedia are listed on their SPARQL endpoint. Using those namespaces and the SPARQL endpoint, we can ask for all the triples that have http://dbpedia.org/resource/Andy_Warhol as the subject with this query:
SELECT * WHERE {
dbpedia:Andy_Warhol ?p ?o .
}
In the results produced there, you'll see the one using http://dbpedia.org/ontology/birthPlace (note the captial P in birthPlace), but you won't see any triples with the predicate http://dbpedia.org/ontology/infuenced, so it makes sense that your first query has no results. Do you have some reason to suppose that there should be some results?

Limit a SPARQL query to one dataset

I'm working with the following SPARQL query, which is an example on the web-based end of my institution's SPARQL endpoint;
SELECT ?building_number ?name ?occupants WHERE {
?site a org:Site ;
rdfs:label "Highfield Campus" .
?building spacerel:within ?site ;
skos:notation ?building_number ;
rdfs:label ?name .
OPTIONAL {
?building soton:buildingOccupants ?occ .
?occ rdfs:label ?occupants .
} .
} ORDER BY ?name
The problem is that as well as getting data from 'Buildings and Places', the Dataset I'm interested in, and would expect the example to use, it also gets data from the 'Facilities and Equipment' dataset, which isn't relevant. You should see this if you follow the link.
I suspect the example may pre-date the addition of the Facilities and Equipment dataset, but even with the research I've done into SPARQL, I can't see a clear way to define which datasets to include.
Can anyone recommend a starting point to limit it to just show 'Buildings', or, more specifically, results from the 'Buildings and Places' dataset.
Thanks
First things first, you really need to use SELECT DISTINCT, as otherwise you'll get repeated results.
To answer your question, you can use GRAPH { ... } to filter certain parts of a SPARQL query to only match data from a specific dataset. This only works if the SPARQL endpoint is divided up into GRAPHs (this one is). The solution you asked for isn't the best choice, as it assumes that things within sites in the 'places' dataset will always be resticted to buildings... That's risky -- as it might end up containing trees and signposts at some time in the future.
Step one is to just find out what graphs are in play:
SELECT DISTINCT ?g1 ?building_number ?name ?occupants WHERE {
?site a org:Site ;
rdfs:label "Highfield Campus" .
GRAPH ?g1 { ?building spacerel:within ?site ;
skos:notation ?building_number ;
rdfs:label ?name .
}
OPTIONAL {
?building soton:buildingOccupants ?occ .
?occ rdfs:label ?occupants .
} .
} ORDER BY ?name
Try it here: http://is.gd/WdRAGX
From this you can see that http://id.southampton.ac.uk/dataset/places/latest and http://id.southampton.ac.uk/dataset/places/facilities are the two relevant ones.
To only look for things 'within' a site according to the "places" graph, use:
SELECT DISTINCT ?building_number ?name ?occupants WHERE {
?site a org:Site ;
rdfs:label "Highfield Campus" .
GRAPH <http://id.southampton.ac.uk/dataset/places/latest> {
?building spacerel:within ?site ;
skos:notation ?building_number ;
rdfs:label ?name .
}
OPTIONAL {
?building soton:buildingOccupants ?occ .
?occ rdfs:label ?occupants .
} .
} ORDER BY ?name
Alternate solutions:
Using rdf:type
Above I've answered your question, but it's not the answer to your problem. This solution is more semantic as it actually says 'only give me buildings within the campus' which is what you really mean.
Instead of filtering by graph, which is not very 'semantic' you could also restrict ?building to be of class 'building' which research facilities are not. They are still sometimes listed as 'within' a site. Usually when the uni has only published what campus they are on but not which building.
?building a rooms:Building
Using FILTER
In extreme cases you may not have data in different GRAPHS and there may not be an elegant relationship to use to filter your results. In this case you can use a FILTER and turn the building URI into a string and use a regular expression to match acceptable ones:
FILTER regex(str(?building), "^http://id.southampton.ac.uk/building/")
This is bar far the worst option and don't use it if you have to.
Belt and Braces
You can use any of these restictions together and a combination of restricting the GRAPH plus ensuring that all ?buildings really are buildings would be my recommended solution.