Remove repeated SPARQL sub-query to compare against the average - sparql

Here is a subquery that counts the number of subject for 10 values of the object from the triples of the form ?sbj pq:P1810 ?obj . The predicate pq:P1810 is used because it is available from wikidata, we could use ?person foaf:name ?name instead, but FOAF is not available in wikidata.
SELECT ?obj (COUNT(?sbj) AS ?num_sbj) WHERE {
?sbj pq:P1810 ?obj .
} GROUP BY ?obj LIMIT 10
I would like to select all ?obj, which have a higher ?num_sbj than the average. The working solution I came up with is this :
SELECT ?obj ?num_sbj ?avg_num_sbj WHERE {
{
SELECT ?obj (COUNT(?sbj) AS ?num_sbj) WHERE {
?sbj pq:P1810 ?obj .
} GROUP BY ?obj LIMIT 10
}
{
SELECT (AVG(?num_sbj_) AS ?avg_num_sbj) WHERE {
{
SELECT ?obj_ (COUNT(?sbj_) AS ?num_sbj_) WHERE {
?sjb_ pq:P1810 ?obj_ .
} GROUP BY ?obj_ LIMIT 10
}
}
}
FILTER (?num_sbj > ?avg_num_sbj)
}
Here is a link to a running example in wikidata https://w.wiki/5Ci7
Unfortunately, the first sub-query has to be repeated in the second sub-query to fetch the average.
Is the subquery actually performed more than once? I feel like a query optimizer could see that the queries are the same and perform the reduction on a copy of the data, but I'm not sure whether this optimization would always be present.
Is it possible to reformulate the working solution such that no subquery is written more than once ?

Related

How to extract persons on a wikipedia list using dbpedia/sparql

I'm trying to extract all persons that won a (Gold) medal at the Olympics and ideally their birth location using the dbpedia SPARQL query. Basically it's this list I'm aiming at: https://de.wikipedia.org/wiki/Liste_der_olympischen_Medaillengewinner_aus_Spanien
I guess it must somehow work with this piece of code:
yago-res:wikicategory_Olympic_bronze_medalists_for_Spain
This doesn't work:
SELECT ?res
WHERE {
?res yago-res:wikicategory_Olympic_bronze_medalists_for_Spain .
}
any ideas?
To get all the spanish persons that have won the gold medal in olympic
select ?person where
{
?person a <http://dbpedia.org/class/yago/OlympicGoldMedalistsForSpain>
}
If you look at what dbpedia has, there is no class:
http://dbpedia.org/class/yago/OlympicGoldMedalists
but there is
http://dbpedia.org/class/yago/OlympicGoldMedalistsForItaly
and
http://dbpedia.org/class/yago/OlympicGoldMedalistsForFrance
and
http://dbpedia.org/class/yago/OlympicGoldMedalistsForGermany
so a work around could be:
select distinct ?person ?birthPlace where
{
?goldForCountry rdfs:subClassOf yago:Medalist110305062 .
?person a ?goldForCountry .
optional{
?person dbo:birthPlace ?birthPlace
}
filter (contains(str(?goldForCountry), "http://dbpedia.org/class/yago/OlympicGoldMedalistsFor"))
}
The birthPlace should be optional because there are 3994 persons that dbpedia doesn't have their birth place

DBPedia skipping some articles

I have a Sparql query:
SELECT DISTINCT ?film_title ?title ?year
WHERE {
?film_title rdf:type <http://dbpedia.org/ontology/Film> .
?film_title rdfs:label ?title .
?film_title <http://dbpedia.org/ontology/releaseDate> ?year .
FILTER (LANG(?title)='en')
} ORDER BY DESC(?year) LIMIT 100 OFFSET 0
I have used descending order for release year. But some movies are missing, like https://en.wikipedia.org/wiki/The_Lady_in_the_Car_with_Glasses_and_a_Gun_(2015_film). It was released on 5th Aug 2015, but it is still not in the list.
Am I doing something wrong here?
It doesn't appear that DBpedia has an entry for that film. If the article was created recently enough, it might not be in the latest DBpedia.

sparql dbpedia.org only goal in fiorentina

Is there some way to obtain only goals that soccer players scored in Fiorentina with dbpedia.org SPARQL endpoint? I tried the following query, but unfortunately I obtain goals for each season.
select * where {
?player a dbo:SoccerPlayer.
?player dbo:team <http://dbpedia.org/resource/ACF_Fiorentina>.
?player dbp:position <http://dbpedia.org/resource/Forward_(association_football)>.
?player dbp:goals ?goal.
}
limit 10000
I think you can do this. If you browse the data for Silva, you'll see a number of career stations, e.g., station 12, each of which has a number of goals. That means you can do:
select * where {
?player a dbo:SoccerPlayer ;
dbp:position <http://dbpedia.org/resource/Forward_(association_football)> ;
dbo:careerStation ?station .
?station dbo:numberOfGoals ?goals ;
dbo:team dbr:ACF_Fiorentina .
}
SPARQL results
Of course, a player might have multiple stations on the same team, so you'd still want to aggregate over each player and sum the goals:
select ?player (sum(?goals) as ?totalGoals) where {
?player a dbo:SoccerPlayer ;
dbp:position <http://dbpedia.org/resource/Forward_(association_football)> ;
dbo:careerStation ?station .
?station dbo:numberOfGoals ?goals ;
dbo:team dbr:ACF_Fiorentina .
}
group by ?player
SPARQL results
Related
There are some other questions that involve querying career stations that might be useful:
SPARQL - query a property and return results for a related property (This is about getting goals, too. Does this happen to be a class assignment or something?)
Obtaining start and end date from a DBPedia CareerStation
How to build correct SPARQL Query

How to get the path length between a child and parent node using skos:broader*

I have the following query getting the terminal leaf nodes from a parent category
select distinct ?subcat where {
?subcat skos:broader* category:Buildings_and_structures_in_France_by_city .
optional { ?subsubcat skos:broader ?subcat }
}
group by ?subcat
having count(?subsubcat) = 0
How do I get the path length between the child node ?subcat and the parent node category:Buildings_and_structures_in_France_by_city such that the output would be something like?
If the real task is finding buildings and structures in France, then you can ask for things in that category of some appropriate types. E.g.,
select distinct ?building where {
values ?type { dbpedia-owl:ArchitecturalStructure
dbpedia-owl:Building
dbpedia-owl:Place }
?building a ?type ;
dcterms:subject/skos:broader* category:Buildings_and_structures_in_France_by_city
}
SPARQL results
That gets just about 700 results. If you find some that aren't France, take a look at their values and see what you could exclude them based on. Perhaps you could add a filter to restrict latitude and longitude, or country values, etc.

Adding more information on a simple SPARQL query

I've been running some examples with SPARQL, and it looks pretty cool.
I'm using for the moment http://dbpedia.org/snorql
I'm trying to query Salty Desserts over there.
I can list Desserts using
SELECT ?food
WHERE {
?food rdf:type <http://dbpedia.org/class/yago/Desserts>
}
ORDER BY ?name
How do I actually put on the query that the food has to be salty? Sorry if this seems to be a dumb question.
If it were sufficient for it to have salt on the list of ingredients:
SELECT DISTINCT ?food
WHERE {
?food rdf:type <http://dbpedia.org/class/yago/Desserts> .
?food <http://dbpedia.org/ontology/ingredient> :Salt .
} ORDER BY ?food
Taste seemed a good lead, but:
SELECT DISTINCT ?property ?hasValue ?isValueOf
WHERE {
{ <http://dbpedia.org/ontology/taste> ?property ?hasValue }
UNION
{ ?isValueOf ?property <http://dbpedia.org/ontology/taste> }
} ORDER BY ?property ?hasValue ?isValueOf
All of this tested on DBPedia snorql