Querying DBPedia using SPARQL to find subjects with same objects - sparql

I am trying to find all those resources from dbpedia for eg rdf:type person who have same object eg date of birth.?
I thought of doing it with subquery but its definitely not the solution.
Can anyone provide some useful pointer?

From what you describe I think you mean:
prefix dbp: <http://dbpedia.org/property/>
prefix foaf: <http://xmlns.com/foaf/0.1/>
select ?s1 ?s2 ?dob
where {
?s1 a foaf:Person ; dbp:birthDate ?dob . # Find a person, get their dob
?s2 a foaf:Person ; dbp:birthDate ?dob . # Find a person with the same dob
}
Adjust type and predicate to suit.
This will include some redundancy: you will find answers where the subjects are the same ('Napoleon' 'Napoleon') and get answers twice ('Daniel Dennett' 'Neil Kinnock', 'Neil Kinnock' 'Daniel Dennett'). You can remove that with a filter:
filter (?s1 < ?s2)
which just ensures that one comes before the other (however the query engine wants to do that).
prefix dbp: <http://dbpedia.org/property/>
prefix foaf: <http://xmlns.com/foaf/0.1/>
select ?s1 ?s2 ?dob
where {
?s1 a foaf:Person ; dbp:birthDate ?dob .
?s2 a foaf:Person ; dbp:birthDate ?dob .
filter (?s1 < ?s2)
}
See the result

A SPARQL query is basically a set of triple patterns, i.e., a join (logical AND) of queries of the form
?subject ?predicate ?object.
What you need is identical ?object. Considering that you only care about ?subject (?predicate is not of importance), you can perform such a query you by ordering the results depending on ?object. Thus you will see results sharing ?object together.
select ?s ?p ?o where {
?s ?p ?o.
}
order by ?o
If you care about ?predicate as well, you should order the result using it second.
select ?s ?p ?o where {
?s ?p ?o.
}
order by ?o ?p
As those couple of queries may involve too many results as they will retrieve all the results possible. I recommend filtering ?object depending on some specific criteria. For example, to select all ?subject sharing an instance of Person as their ?object, use:
select ?s where {
?s ?p ?o.
{select ?o where{
?o a <http://dbpedia.org/ontology/Person>}
}
}

An alternative solution to the others is using aggregate functions like in this query template
select ?o (count(distinct ?s) as ?cnt) (group_concat(distinct ?s; separator=";") as ?subjects) {
?s a <CLASS> ;
<PREDICATE> ?o .
}
group by ?o
order by desc(count(distinct ?s))
which returns for each object the number of subjects and the list of subject belonging to a class CLASS for a given predicate PREDICATE
For example, asking for the dates of soccer players one could use
prefix dbo: <http://dbpedia.org/ontology/>
select ?date (count(distinct ?s) as ?cnt) (group_concat(distinct ?s; separator=";") as ?subjects) {
?s a dbo:SoccerPlayer ;
dbo:birthDate ?date .
}
group by ?date
order by desc(count(distinct ?s))

select * where {
?person1 a <http://dbpedia.org/ontology/Person>.
?person1 dbo:birthYear ?date.
?person2 a <http://dbpedia.org/ontology/Person>.
?person2 dbo:birthYear ?date
FILTER (?person1 != ?person2)
}
limit 10
Dbpedia will not allow you to execute that query on its public endpoint because it consumes more time that allowed, and you cannot change that time. Nevertheless, there are ways to execute it

Related

Sparql find entities with dbo type and sort them by count

I have to do a sparql query to the dbpedia endpoint which needs to:
Find all the entities containing "vienna" in the label and "city" in the abstract
Filter them keeping only the ones that have at least one dbo rdf:type
Sort the results by count of dbo types (e.g. if an entity has 5 dbo rdf:type it has to be shown before entities with 4 dbo rdf:type)
I did several attempts, the closest to the result is:
select distinct (str(?s) as ?s) count(?t) as ?total where {{ ?s rdfs:label "vienna"#en. ?s rdf:type ?t.}
UNION { ?s rdfs:label ?l. ?s rdf:type ?t . ?l <bif:contains> '("vienna")'
. FILTER EXISTS { ?s dbo:abstract ?cc. ?cc <bif:contains> '("city")'. FILTER(lang(?cc) = "en").}}
FILTER (!strstarts(str(?s), str("http://dbpedia.org/resource/Category:")))
. FILTER (!strstarts(str(?s), str("http://dbpedia.org/property/")))
. FILTER (!strstarts(str(?s), str("http://dbpedia.org/ontology/")))
. FILTER (strstarts(str(?t), str("http://dbpedia.org/ontology/"))).}
LIMIT 50
Which will (wrongly) count the rdf:type before actually filtering it. I don't want to count rdf:type that are not dbo (ontology).
The idea is to use a subquery in which you search for the entities and to do the counting in the outer query:
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?s (count(*) AS ?cnt)
WHERE
{ { SELECT DISTINCT ?s
WHERE
{ ?s rdfs:label ?l .
?l <bif:contains> '"vienna"'
FILTER langMatches(lang(?l), "en")
FILTER EXISTS { ?s dbo:abstract ?cc .
?cc <bif:contains> '"city"'
FILTER langMatches(lang(?cc), "en")
}
?s rdf:type ?t
FILTER ( ! strstarts(str(?s), str("http://dbpedia.org/resource/Category:")) )
FILTER ( ! strstarts(str(?s), str("http://dbpedia.org/property/")) )
FILTER ( ! strstarts(str(?s), str(dbo:)) )
FILTER strstarts(str(?t), str(dbo:))
}
}
?s ?p ?o
FILTER strstarts(str(?p), str(dbo:))
}
GROUP BY ?s
ORDER BY DESC(?cnt)

How to return all S->P->O triples from a starting resource to a specified path depth?

My goal is to graphically represent the S->P->O relations within a depth two edges from the specified resource, p:Person_1. I want all relations within that path length to be returned from my query as ?s, ?p, ?o for further processing in my graphical application.
I tried the first query below which gives me my first set of ?s ?p ?o with repeats, then ?p2, ?o2, ?p3, ?o3 as additional columns in the result. I want to bind ?p2 and ?p3 to ?p, ?o2 and ?o3 to ?o.
SELECT *
WHERE {
p:Person_1 ?p ?o .
BIND("p:Person_1" as ?s)
OPTIONAL{
?o ?p2 ?o2 .
}
OPTIONAL{
?o2 ?p3 ?o3 .
}
}
Then, based on How do I construct get the whole sub graph from a given resource in RDF Graph?, I tried using CONSTRUCT to return the graph.
PREFIX p: <http://www.example.org/person/>
PREFIX x: <example.org/foo/>
construct { ?s ?p ?o }
FROM <http://localhost:8890/MYGRAPH>
where { p:Person_1 (x:|!x:)* ?s .
?s ?p ?o .
}
I am using Virtuoso and I get the error:
Virtuoso 37000 Error SP031: SPARQL compiler: Variable ?_::trans_subj_9_3 in T_IN list is not a value from some triple
I could post-process the result from my first query but I want to learn how to do this correctly with SPARQL, preferably on Virtuoso.
Update after testing the advice from #AKSW :
Both CONSTRUCT and SELECT statements work with the pattern suggested.
CONSTRUCT { ?s ?p ?o }
FROM <http://localhost:8890/MYGRAPH>
where { p:Person_1 (x:foo|!x:bar)* ?s .
?s ?p ?o .
} LIMIT 100
and:
SELECT s ?p ?o
FROM <http://localhost:8890/MYGRAPH>
where { p:Person_1 (x:foo|!x:bar)* ?s .
?s ?p ?o .
} LIMIT 100
The SELECT results in several duplicates that cannot be removed using DISTINCT, which results in an error that I assume is due to the 'datatype' of some of the returned values.
Virtuoso 22023 Error SR066: Unsupported case in CONVERT (DATETIME -> IRI_ID)
It appears some post-SPARQL processing is in order.
This gets me most of the way there. Still hoping I can find a solution for SPARQL that is like Cypher's "number of hops away" :
OPTIONAL MATCH path=s-[*1..3]-(o)
Here is a SPARQL query that works in Virtuoso. Note the SPARQL W3C standard does not support this syntax and it will fail in other triplestores.
PREFIX p: <http://www.example.org/person/>
PREFIX x: <example.org/foo/>
# CONSTRUCT {?s ?p ?o} # If you wish to return the graph
SELECT ?s ?p ?o # To return the triples
FROM <http://localhost:8890/MYGRAPH>
where { p:Person_1 (x:foo|!x:bar){1,3} ?s .
?s ?p ?o .
}LIMIT 100
See also K. Idehen's wiki entry here: http://linkedwiki.com/exampleView.php?ex_id=141
And thanks to #Joshua Taylor for advice in the same area.
Working Drafts of SPARQL 1.1 Property Paths included the {n,m} operator for handling this issue, which was implemented (and will remain supported) in Virtuoso. Here's a tweak to #tim's response.
Live SPARQL Query Results Page using the DBpedia endpoint (which is a Virtuoso instance).
Live SPARQL Query Definition Page that opens up query source code in the default DBpedia query editor.
Actual Query Example:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT DISTINCT ?s AS ?Entity
?o AS ?Category
WHERE {
?s rdf:type <http://dbpedia.org/ontology/AcademicJournal> ;
rdf:type{1,3} ?o
}
LIMIT 100
Should you be looking for LinkedIn-like presentation of Contact Networks and Degrees of Separation between individuals, here is an example using Virtuoso-specific SPARQL Extensions that solve this particular issue:
SELECT ?o AS ?WebID
((SELECT COUNT (*) WHERE {?o foaf:knows ?xx})) AS ?contact_network_size
?dist AS ?DegreeOfSeparation
<http://www.w3.org/People/Berners-Lee/card#i> AS ?knowee
WHERE
{
{
SELECT ?s ?o
WHERE
{
?s foaf:knows ?o
}
} OPTION (TRANSITIVE, t_distinct, t_in(?s), t_out(?o), t_min (1), t_max (4), t_step ('step_no') AS ?dist) .
FILTER (?s= <http://www.w3.org/People/Berners-Lee/card#i>)
FILTER (isIRI(?s) and isIRI(?o))
}
ORDER BY ?dist DESC (?contact_network_size)
LIMIT 500
Note: this approach is the only way (at the current time) to expose actual relational hops between entities in an Entity Relationship Graph that includes Transitive relations.
Live Link to Query Results
Live Link to Query Source Code
Bearing in mind that the r{n,m} operator was deprecated in the final SPARQL 1.1 (but will remain supported in Virtuoso), you can use r/r?/r? instead of r{1,3}, if you want to work strictly off the current spec:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT DISTINCT ?s AS ?Entity
?o AS ?Category
WHERE {
?s rdf:type <http://dbpedia.org/ontology/AcademicJournal> ;
rdf:type / rdf:type? / rdf:type? ?o
}
LIMIT 100
Here's a live example, against the DBpedia instance hosted in Virtuoso.

SPARQL: extract the dbo class for each result

I have a DBpedia request that give me a label, a Dpedia URI and the corresponding wikipedia link. I want to add a column to get the dbo class of each line. Can any one help me please?
PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select distinct ?Nom ?resource ?url ?p
where {
?resource rdfs:label ?Nom.
?resource foaf:isPrimaryTopicOf ?url.
?resource rdf:type ?p
FILTER(regex(?p,"http://dbpedia.org/ontology/"))
FILTER ( langMatches( lang(?Nom), "EN" )).
?Nom <bif:contains> "Apple".
}
Firstly, if you have the solution, then you should post it as an answer and then mark it as accepted so that it is easier for others looking for the solution.
Secondly, I feel that the solution you came up would work but is not the right way to do it. For getting dbo: types, one should query for types that are owl:Class types.
This is how I would do it
PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
select distinct ?Nom ?resource ?url ?p
where {
?resource rdfs:label ?Nom ;
foaf:isPrimaryTopicOf ?url ;
a ?p .
?p a owl:Class .
FILTER ( langMatches( lang(?Nom), "EN" )).
?Nom <bif:contains> "Apple".
} limit 100
Not sure what do you mean by:
get the dbo class of each line.
But if you mean to get all predicates related to each row, the following query may help. I limited them to English ones to see the results better.
PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dbo: <http://dbpedia.org/ontology/>
select distinct ?Nom ?resource ?url ?p
where {
?resource rdfs:label ?Nom.
?resource ?p ?dbo.
?resource foaf:isPrimaryTopicOf ?url.
FILTER ( langMatches( lang(?Nom), "EN" )).
?Nom <bif:contains> "Apple".
FILTER langMatches( lang(?Nom), "en" )
}
limit 100

Dbpedia/sparql: get population & lat/lng of all cities/towns/villages in UK

I'm entering the following query at http://dbpedia.org/sparql:
PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>
SELECT ?s ?name ?value ?lat ?lng
WHERE {
?s a <http://dbpedia.org/ontology/PopulatedPlace> .
?s <http://dbpedia.org/property/name> ?name .
?s <http://dbpedia.org/property/populationTotal> ?value .
FILTER (?lng > -8.64 AND ?lng < 2.1 AND ?lat < 61.1 AND ?lat > 49.35 )
?s geo:lat ?lat .
?s geo:long ?lng .
}
(The bounding box is intended to be for the UK, the other option is to add <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/United_Kingdom> ., but there's a possibility that some places might not have been tagged with UK as the country).
The problem is that it doesn't seem to be pulling back many places (around 290).
Swapping population for populationTotal gives 1588 places, and I can't figure out (semantically) which one should be used.
Is this a limitation with the underlying data, or is there something that could be improved in the way I'm formulating the query?
Note: this question is mainly academic now as I got the info from http://download.geonames.org/export/dump/GB.zip, but I'd much prefer to use open data and the semantic web, so posting up this question to see if there was something I was missing, or to find out if there is a shortcoming in how the data is being scraped from Wikipedia and whether I can muck in.
Your query is only returning locations that have a value for populationTotal. For example, if Town A has "10,000" for populationTotal in the database, and Town B has NULL, only Town A will be returned.
If you want to return all locations in the UK, then you need to specify population as an optional parameter. This query will show you all the locations, as well as the populations for the ones that have that data.
PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>
SELECT ?s ?name ?value ?lat ?lng
WHERE {
?s a <http://dbpedia.org/ontology/PopulatedPlace> .
?s <http://dbpedia.org/property/name> ?name .
OPTIONAL { ?s <http://dbpedia.org/property/populationTotal> ?value . }
FILTER (?lng > -8.64 AND ?lng < 2.1 AND ?lat < 61.1 AND ?lat > 49.35 )
?s geo:lat ?lat .
?s geo:long ?lng .
}

GROUP BY in SPARQL?

Assume that I uses the FOAF ontology. I want to return the name and the mbox for each person. I use the following query:
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT DISTINCT ?name ?email
WHERE {
?person foaf:name ?name.
?person foaf:mbox ?email.
}
The result of the join is a set of rows: ?person, ?name, ?email. This query is returning the ?name and ?email. Note that in some of the ?person may have multiple mailboxes, so in the returned set, a ?name row may appear multiple times, once for each mailbox.
Is there a solution to make a GROUP BY person ?name?
You can group by person but then you need an aggregation for ?name and ?email
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT (sample(?name) AS ?name2) (sample(?email) as ?email2)
WHERE {
?person foaf:name ?name.
?person foaf:mbox ?email.
} GROUP BY ?person
SAMPLE picks one possible from the group for each ?person.
or maybe
SELECT (group_concat(?name) AS ?names)
(except that's a string).
It may be easier work with
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?name ?email
WHERE {
?person foaf:name ?name.
?person foaf:mbox ?email.
}
ORDER BY ?person ?name ?email
and process the results in your application where you know the incoming results have all the entries for one person is a single section of the results.