How To Efficiently Design Nested OPTIONAL Clauses In SPARQL - sparql

I want to retrieve data that has optional elements at multiple levels. For example, assume I have four ancestors - Fred, Sam, George, and Mark. Fred and Sam have kids ... George and Mark do not. All of Fred's kids have nicknames, but two of Sam's four kids do not.
I want to query all of the kids of my ancestors and return their names, ages, and nicknames.
It seems like this would work:
SELECT DISTINCT ?token ?ancestorName ?childName ?childAge ?childNickname
WHERE
{
FILTER ( ?token IN ("Fred","Sam","George","Mark") )
?ancestor foo:name ?token .
?ancestor foo:fullname ?ancestorName .
OPTIONAL
{
?ancestor foo:parentOf ?child .
?child foo:fullname ?childName .
?child foo:age ?childAge .
OPTIONAL { ?child foo:nickname ?childNickname }
}
}
Everything seems to work fine if an ancestor doesn't have a child ... all of the outer optional clause returns quickly
with no data. If the ancestor has children, and each has a nickname, it returns quickly and fills in the data. The problem seems to happen when the ancestor has a child, but the child does not have a nickname.
It works ... but it takes a very long time (I have lots of data). It appears that the inner OPTIONAL clause
OPTIONAL { ?child foo:nickname ?childNickname }
... does a cross product ... combining every ?child with every ?childNickname ... and then returns the right value.
How can I write this SPARQL SELECT to run efficiently (not do a cross product) and return all of the ancestors and all of the kids even if a kid doesn't have a nickname? I've tried FILTERS. I've tried checking whether ?child was BOUND. I haven't found the secret to make it run quick.
Thanks for the help!

Related

Restrict SPARQL property path predicates based on blank node attached meta / reified data

I want to traverse graph starting from any "root" concept and getting down to its leaf concepts moving by reified predicates of certain type (e.g. hasChild only).
I have a large graph in which Concepts C are connected with named predicates R.
Predicates are in turn attached to blank nodes B which hold their meta data, including connection type CT.
Essentially the pattern is:
root_concept -[^subject]-> blank -[object]-> concept -[^subject]-> blank -[object]-> concept -[^subject]-> blank -[object]-> ...
And I want to get all downstream concepts.
So normally you would do something like (1):
SELECT ?c
WHERE {
FILTER( ?root = <SomeValue> ) .
?root (^rdf:subject/rdf:object)* ?c .
}
But I need to get intermediate blank node B and filter on its CT!
For a single step this looks like (2):
SELECT ?c
WHERE {
FILTER( ?root = <SomeValue> ) .
?root ^rdf:subject ?blank .
?blank rdf:object ?c .
?blank :hasRelationType "hasChild" .
}
Question: How to merry (1) and (2)?
Additional info:
Each blank node has at least these triples:
1 blank1 http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://some/ontology/terms/Relationship
3 blank1 http://www.w3.org/1999/02/22-rdf-syntax-ns#object C2
4 blank1 http://www.w3.org/1999/02/22-rdf-syntax-ns#predicate R
5 blank1 http://www.w3.org/1999/02/22-rdf-syntax-ns#subject C1
6 blank1 http://some/ontology/terms/hasRelationType NotHasChild (!)
Predicates represented by blank nodes B actually have their own IDs, but they are completely uninformative (i.e. rdf:predicate from blank nodes leads to something like R0134235)
I have tried modeling my query based on arbitrary length property paths, and general recipe for getting all child properties, but couldn't figure that out.

A SPARQL query to find entities that are similar to an entity

I have the URI of an entity (:P1) and I want to find all entities that are similar to this entity. Right now, I am trying to find those entities that are connected to :P1 through a common entity and the same attribute. I have a query like this.
SELECT ?simP (SUM(?score) AS ?simScore)
WHERE {
{
:P1 ?prop ?q.
?q ?propInv ?simP.
?propInv owl:inverseOf ?prop.
FILTER(?simP != :P1).
BIND(1 AS ?score).
} UNION {
:P1 ?prop1 ?q1.
?q1 ?prop2 ?q.
?q ?prop2Inv ?q2.
?q2 ?prop1Inv ?simP.
?prop1Inv owl:inverseOf ?prop1.
?prop2Inv owl:inverseOf ?prop2.
FILTER(?simP != :P1).
BIND(0.5 AS ?score).
}
}
GROUP BY ?simP
ORDER BY DESC(?simScore)
As you can see, I am trying to find those entities that are connected to P1 through a common entity at a distance of 1 hop and then 2 hops. Scores (reciprocal of the number of hops) reduce as the number of hops increase.
My issues with this query are
It requires that each property have an inverse (owl:inverseOf) defined which need not always be the case.
The number of statements in the queries go on increasing as I increase the number of hops from 1 to 2 to 3 and so on.
My question is if there is a better way to getting the outcome I am expecting. It would be great if the query can at the very least get entities that are connected to :P1 through a common entity atleast 2 hops away without having to add the UNION clause for each hop.
Also, is there a better approach to getting entities considered "similar" to :P1.

Retrieve the US release date for a movie from Wikidata using Sparql

I am trying to retrieve the titles and release dates (publication date) for movies using the wikidata.org sparkql endpoint (https://query.wikidata.org/). The titles are listed in different languages, which are filtered in the query below. However, some movies also have several publication dates (e.g. for different countries), e.g. https://www.wikidata.org/wiki/Q217020. I'm not sure how the RDF triple structure is actually used to assign a country to the value of another triple, but specifically, how can I only retrieve the publication date for a movie in the US?
SELECT ?item ?title ?publicationdate
WHERE {
?item wdt:P31 wd:Q11424 ;
rdfs:label ?title ;
wdt:P577 ?publicationdate ;
filter ( lang(?title) = "en" )
}
ORDER BY ?movieid
LIMIT 10
Solution
The solution provided by M.Sarmini works. Apparently, facts such as publication data are stored as n-ary relations, they create a unique symbolic tag that links the resources. The value that P577 links to is just the date, when turned into a string will give the release date, while in reality it is a token that you can link to other qualifiers.
Just add a new variable to hold the place of publication and filter your results to just list US films like this:
PREFIX q: <http://www.wikidata.org/prop/qualifier/>
PREFIX s: <http://www.wikidata.org/prop/statement/>
SELECT distinct ?item ?title ?publicationdate
WHERE {
?item wdt:P31 wd:Q11424;
rdfs:label ?title;
p:P577 ?placeofpublication.
?placeofpublication q:P291 wd:Q30.
?placeofpublication s:P577 ?publicationdate;
filter ( lang(?title) = "en")
}
ORDER BY ?item

How to get the path length between a child and parent node using skos:broader*

I have the following query getting the terminal leaf nodes from a parent category
select distinct ?subcat where {
?subcat skos:broader* category:Buildings_and_structures_in_France_by_city .
optional { ?subsubcat skos:broader ?subcat }
}
group by ?subcat
having count(?subsubcat) = 0
How do I get the path length between the child node ?subcat and the parent node category:Buildings_and_structures_in_France_by_city such that the output would be something like?
If the real task is finding buildings and structures in France, then you can ask for things in that category of some appropriate types. E.g.,
select distinct ?building where {
values ?type { dbpedia-owl:ArchitecturalStructure
dbpedia-owl:Building
dbpedia-owl:Place }
?building a ?type ;
dcterms:subject/skos:broader* category:Buildings_and_structures_in_France_by_city
}
SPARQL results
That gets just about 700 results. If you find some that aren't France, take a look at their values and see what you could exclude them based on. Perhaps you could add a filter to restrict latitude and longitude, or country values, etc.

Finding proximity between two entity using Category matching

All queries are tested on sparql virtuso endpoint
I want to find the categories of two dbpedia subject like here Bharatiya_Janata_Party and New_Delhi. I want to match how the categories of these are similar to each other.
As here in the first query i got the categories of Bharatiya_Janata_Party.
In the Second query i got the categories of New_Delhi.
Now I want to match the result of category of Bharatiya_Janata_Party to that of New_Delhi. Like here
Nationalist_parties---New_Delhi
Nationalist_parties---New_Delhi_district
Nationalist_parties---Populated_places_established_in_1911
Nationalist_parties---Capitals_in_Asia
Nationalist_parties---Capitals_in_Asia
Nationalist_parties--Planned_capitals
Political_parties_established_in_1980---New_Delhi
Political_parties_established_in_1980---New_Delhi_district
.....
....
..
..
I have fired a query III for making match between Nationalist_parties---New_Delhi. I got a match at level 4((^skos:broader){0,4}).
Similarly Again I have to do for Nationalist_parties---New_Delhi_district.
The real problem is that i want to combine these 3 queries so that i may get the direct result in a tabular form. Is there any way to automate the whole process.
Query I:
SELECT *
WHERE {
dbpedia:Bharatiya_Janata_Party dcterms:subject ?x
}
Result of Query I:
dbpedia.org/resource/Category:Nationalist_parties
dbpedia.org/resource/Category:Political_parties_established_in_1980
dbpedia.org/resource/Category:Conservative_parties_in_India
dbpedia.org/resource/Category:Hindu_political_parties
dbpedia.org/resource/Category:Hindutva
dbpedia.org/resource/Category:Bharatiya_Janata_Party
dbpedia.org/resource/Category:1980_establishments_in_India
Query II:
SELECT *
WHERE {
dbpedia:New_Delhi dcterms:subject ?x
}
Result of Query II:
dbpedia.org/resource/Category:New_Delhi
dbpedia.org/resource/Category:New_Delhi_district
dbpedia.org/resource/Category:Populated_places_established_in_1911
dbpedia.org/resource/Category:Capitals_in_Asia
dbpedia.org/resource/Category:Indian_capital_cities
dbpedia.org/resource/Category:Planned_capitals
dbpedia.org/resource/Category:Urdu-speaking_countries_and_territories
QUERY III:
select distinct ?super where {
?super (^skos:broader){0,4} category:Nationalist_parties, category:New_Delhi
}
Result:
dbpedia.org/resource/Category:Government-related_organizations
dbpedia.org/resource/Category:Government
First Match at level 4 with 2 Super Classes
P.S: It is not necessary that the other query will match at (^skos:broader){0,4}. So i am manually firing the above query from (^skos:broader){0,0} and incrementing as (^skos:broader){0,1}->(^skos:broader){0,2)...to the first match.
select distinct ?super where {
?super (^skos:broader){0,6} category:Nationalist_parties, category:New_Delhi_district
}
Result:
dbpedia.org/resource/Category:Categories_by_topic
dbpedia.org/resource/Category:Government
dbpedia.org/resource/Category:Categories_by_parameter
dbpedia.org/resource/Category:Political_geography
First Match at level 6 with 4 Super Classes
===================================
Combining these 3 queries i want this type of result in a tabular form:-
==================================
**CategoryI(QueryI---Category(QuesryII)---Level --count matches*
Nationalist_parties---New_Delhi---------------------------- 4------ 2
Nationalist_parties---New_Delhi_district-------------------6--------4
Nationalist_parties---Populated_places_established_in_1911
Nationalist_parties---Capitals_in_Asia
Nationalist_parties---Capitals_in_Asia
...
.....
....
Please help me to automate and combine the above query. I have read several posts but not able to figure it how.