SPARQL DBpedia by taxonomic term - sparql

Have the following working SPARQL query that selects items from DBpedia that include the string "fish" in their name.
SELECT ?name, ?kingdom, ?phylum, ?class, ?order, ?family, ?genus, ?species, ?subspecies, ?img, ?abstract
WHERE {
?s dbpedia2:regnum ?hasValue;
rdfs:label ?name
FILTER regex( ?name, "fish", "i" )
FILTER ( langMatches( lang( ?name ), "EN" ))
?animal dbpedia2:name ?name;
foaf:depiction ?img;
dbpedia2:regnum ?kingdom
OPTIONAL { ?animal dbpedia2:ordo ?order . }
OPTIONAL { ?animal dbpedia2:phylum ?phylum . }
OPTIONAL { ?animal dbpedia2:classis ?class . }
OPTIONAL { ?animal dbpedia2:familia ?family . }
OPTIONAL { ?animal dbpedia2:genus ?genus . }
OPTIONAL { ?animal dbpedia2:species ?species . }
OPTIONAL { ?animal dbpedia2:subspecies ?subspecies . }
OPTIONAL {
FILTER ( langMatches( lang( ?abstract ), "EN" ))
}
}
GROUP BY ?name
LIMIT 500
Here is the result on SNORQL.
This approach finds animals with the word "fish" in their name (example: "starfish" which is not a fish but member of the phylum Echinoderm).
Would like a more precise query that selects DBpedia items by phylum, or by class, or by order, etc.
How to change the query to search only on dbpedia2:phylum (Chordata); on dbpedia2:classis (Actinopterygii); on dbpedia2:familia; etc. ?

Looking at Tuna, I see that there is a rdf:type assertion for the class
http://umbel.org/umbel/rc/Fish
that looks useful. E.g.,
select ?fish { ?fish a <http://umbel.org/umbel/rc/Fish> }
SPARQL results (10,000)
There's also the dbpedia-owl:Fish class, which gets more results:
select (count(*) as ?nFish) where {
?fish a dbpedia-owl:Fish .
}
SPARQL results (17,420)
While Wikipedia has lots of scientific classification information, I don't see much of it reflected in DBpedia. E.g,. while the Wikipedia article for Tuna has kingdom, phylum, class, order, etc., I don't see that data in the corresponding DBpedia resource.
Notes
Note that your query, as written, isn't actually legal SPARQL (even if Virtuoso, the SPARQL endpoint that DBpedia uses, accepts it). You can't have commas between the projection variables. Also, once you group by one variable, the non-group variables can't appear in the variable list. You could sample the other values though. E.g., you should end up with something like:
SELECT
?name
(sample(?kingdom) as ?kingdom_)
(sample(?phylum) as ?phylum_)
#-- ...
(sample(?img) as ?img_)
(sample(?abstract) as ?abstract_)
WHERE {
?s dbpedia2:regnum ?hasValue;
rdfs:label ?name
FILTER regex( ?name, "fish", "i" )
FILTER ( langMatches( lang( ?name ), "EN" ))
?animal dbpedia2:name ?name;
foaf:depiction ?img;
dbpedia2:regnum ?kingdom
OPTIONAL { ?animal dbpedia2:ordo ?order . }
OPTIONAL { ?animal dbpedia2:phylum ?phylum . }
OPTIONAL { ?animal dbpedia2:classis ?class . }
OPTIONAL { ?animal dbpedia2:familia ?family . }
OPTIONAL { ?animal dbpedia2:genus ?genus . }
OPTIONAL { ?animal dbpedia2:species ?species . }
OPTIONAL { ?animal dbpedia2:subspecies ?subspecies . }
OPTIONAL {
FILTER ( langMatches( lang( ?abstract ), "EN" ))
}
}
GROUP BY ?name
LIMIT 500

Related

SPARQL limit result of only one variable

I am trying to query wikidata to retrieve artworks, their material characteristics and the artistic movement they are associated with. Each resulting record can have a number of movements/materials associated with (as an artwork can be classified as belonging to two movements at the same time, or with different materials).
I would like to retrieve for each artwork only one of the movement/material associated with, as not to have duplicate lines in the results to manually remove afterwards.
How can I achieve such result using only SPARQL?
Here's my current query:
SELECT DISTINCT ?artwork ?image ?time ?creatorLabel ?movementLabel ?materialLabel WHERE {
?artwork wdt:P31 wd:Q3305213 ;
wdt:P571 ?time ;
wdt:P18 ?image .
OPTIONAL {
?artwork wdt:P170 ?creator
}
OPTIONAL {
?artwork wdt:P135 ?movement.
}
OPTIONAL {
?artwork wdt:P186 ?material.
}
FILTER(?time > "1870-01-01T00:00:00"^^xsd:dateTime)
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". } # Helps get the label in your language, if not, then en language
}
LIMIT 100
I tried to use COUNT and HAVING (HAVING (COUNT(?material) < 2)) to limit the result, but with such method I get a timeout. Is there any other way?
You can use SAMPLE, which picks an arbitrary value:
SELECT DISTINCT ?artwork ?image ?time ?creatorLabel (SAMPLE(?movementLabel) AS ?movementLabel_sample) (SAMPLE(?materialLabel) AS ?materialLabel_sample)
WHERE {
{
SELECT ?artwork ?image ?time ?creatorLabel ?movementLabel ?materialLabel
WHERE {
VALUES ?artwork { wd:Q728373 wd:Q720602 } # remove this line to query all artworks
?artwork wdt:P31 wd:Q3305213 ;
wdt:P571 ?time ;
wdt:P18 ?image .
OPTIONAL { ?artwork wdt:P170 ?creator . }
OPTIONAL { ?artwork wdt:P135 ?movement . }
OPTIONAL { ?artwork wdt:P186 ?material. }
FILTER(?time > "1870-01-01T00:00:00"^^xsd:dateTime)
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". } # Helps get the label in your language, if not, then en language
}
}
}
GROUP BY ?artwork ?image ?time ?creatorLabel
LIMIT 100
But if your only concern is
not to have duplicate lines in the results to manually remove afterwards
you could use GROUP_CONCAT to get one line per artwork, with multiple values per cell:
SELECT DISTINCT ?artwork ?image ?time ?creatorLabel (GROUP_CONCAT(DISTINCT ?movementLabel; separator=", ") AS ?movementLabels) (GROUP_CONCAT(DISTINCT ?materialLabel; separator=", ") AS ?materialLabels)
WHERE {
{
SELECT ?artwork ?image ?time ?creatorLabel ?movementLabel ?materialLabel
WHERE {
VALUES ?artwork { wd:Q728373 wd:Q720602 } # remove this line to query all artworks
?artwork wdt:P31 wd:Q3305213 ;
wdt:P571 ?time ;
wdt:P18 ?image .
OPTIONAL { ?artwork wdt:P170 ?creator . }
OPTIONAL { ?artwork wdt:P135 ?movement . }
OPTIONAL { ?artwork wdt:P186 ?material. }
FILTER(?time > "1870-01-01T00:00:00"^^xsd:dateTime)
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". } # Helps get the label in your language, if not, then en language
}
}
}
GROUP BY ?artwork ?image ?time ?creatorLabel
LIMIT 100
(If there can be multiple images, times, or creators, you could do the same for these properties, too.)

SPARQL query for dbpedia to find ski lifts

I'm currently trying to teach myself how to formulate SPARQL queries to extract tourism-related information from DBpedia (via http://dbpedia.org/sparql/).
So far, I've managed to get all museums for a country.
select ?thing ?type ?category ?long ?lat ?country
where
{
VALUES ?country { <http://dbpedia.org/resource/Canada> }
optional
{
?city dbo:country ?country
}
?thing dbo:location ?city.
optional
{
?thing a ?type .
VALUES ?type { dbo:Museum }
BIND( 'Museum' as ?category )
}
optional
{
?thing a ?type.
VALUES ?type { dbo:skiLift }
BIND( 'Skilift' as ?category )
}
optional
{
?thing geo:long ?long.
?thing geo:lat ?lat
}
{
?thing a dbo:Place
}
filter (BOUND (?type))
}
However, I don't understand what I need to do to also get the same information for things like dbo:skiLift, dbo:touristicSite and the like (found here: http://dbpedia.org/ontology/Place).
What am I doing wrong?
This is because both dbo:skiLift and dbo:touristicSite are properties. These resources show up in the page for Place not as subclasses of Place, but as properties which have the class Place as their domain or range. If you want to find subclasses of Place you can perform the exploratory query (which also uses property path to retrieve the transitive closure of the subClassOf property):
select ?thing
where
{
?thing rdfs:subClassOf+ <http://dbpedia.org/ontology/Place> .
}
Apart from that, I cannot understand why you use two optional clauses for different types in the same query. For example, the following query retrieves museums located at cities of Canada, possibly with their lat and lon, without the use of other optional clauses:
select ?thing ?city ?long ?lat
where
{
?city dbo:country <http://dbpedia.org/resource/Canada> .
?thing dbo:location ?city .
?thing a dbo:Museum .
optional
{
?thing geo:long ?long .
?thing geo:lat ?lat
}
}

How to get all properties for only a specific category in Wikidata?

Is there an RDF data/other format that allow me to get all the properties that can exist in a category e.g. Person, then I should be returned properties like sex, date of birth.
How to query this information at https://query.wikidata.org/ ?
What I want is this https://www.wikidata.org/wiki/Wikidata:List_of_properties/Summary_table
But is there a better format for this? I want to access programmatically.
UPDATE
This query is too heavy, causes timeout.
SELECT ?p ?attName WHERE {
?q wdt:P31 wd:Q5.
?q ?p ?statement.
?realAtt wikibase:claim ?p.
?realAtt rdfs:label ?attName.
FILTER(((LANG(?attName)) = "en") || ((LANG(?attName)) = ""))
}
GROUP BY ?p ?attName
I must specify the entity, e.g. to Barrack Obama then it works, but this does not give me the all possible properties.
SELECT ?p ?attName WHERE {
BIND(wd:Q76 AS ?q)
?q wdt:P31 wd:Q5.
?q ?p ?statement.
?realAtt wikibase:claim ?p.
?realAtt rdfs:label ?attName.
FILTER(((LANG(?attName)) = "en") || ((LANG(?attName)) = ""))
}
GROUP BY ?p ?attName
1
The page you have linked to is created by a bot. Contact the BetaBot operator, if you need to know how the bot works.
2
Perhaps the bot relies on the wd:P1963 property:
SELECT ?property ?propertyLabel {
VALUES (?class) {(wd:Q5)}
?class wdt:P1963 ?property
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
} ORDER BY ASC(xsd:integer(strafter(str(?property), concat(str(wd:), "P"))))
The above query returns 49 results.
3
I'd suggest you rely on type constraints from property pages:
SELECT ?property ?propertyLabel {
VALUES (?class) {(wd:Q5)}
?property a wikibase:Property .
?property p:P2302 [ ps:P2302 wd:Q21503250 ;
pq:P2309 wd:Q21503252 ;
pq:P2308 ?class ] .
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
} ORDER BY ASC(xsd:integer(strafter(str(?property), concat(str(wd:), "P"))))
The above query returns 700 results.
4
The first query from your question works fine for relatively small classes, e. g. wd:Q6256 ('country'). On the public endpoint, it is not possible to make the query work for large classes.
However, you could split the query into small parts. In Python:
from wdqs import Client
from time import sleep
client = Client()
result = client.query("SELECT (count(?p) AS ?c) {?p a wikibase:Property}")
count = int(result[0]["c"])
offset = 0
limit = 50
possible = []
while offset <= count:
props = client.query("""
SELECT ?property WHERE {
hint:Query hint:optimizer "None" .
{
SELECT ?property {
?property a wikibase:Property .
} ORDER BY ?property OFFSET %s LIMIT %s
}
?property wikibase:directClaim ?wdt.
FILTER EXISTS {
?human ?wdt [] ; wdt:P31 wd:Q5 .
hint:Group hint:maxParallel 501 .
}
hint:Query hint:filterExists "SubQueryLimitOne" .
# SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}
""" % (offset, limit))
for prop in props:
possible.append(prop['property'])
offset += limit
print (len(possible), min(offset, count))
sleep(0.25)
The last line of the output is:
2156 5154

SPARQL distinct rowcount not returning correct number

I have the following SPARQL query , please pay attention to the rowcount in the selection predicate and the group by clause at the end of the query.
I want the query to return the correct row count in every record , I noticed that the row count that is being returned is not correct meaning if there is a single record I get 1 or sometimes 2 and if there are more than 2 records , I still get a 2, basically it seems to return random value.
I know its because of some issue with my query , can someone please let me know what I might be doing wrong ? I am using Apache Jena.
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX pf: <http://jena.hpl.hp.com/ARQ/property#>
PREFIX d: <http://data-vocabulary.org/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX s: <http://schema.org/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX gr: <http://purl.org/goodrelations/v1#>
SELECT DISTINCT (count(*) AS ?rowCount) ?productName ?offerImage ?offerName ?productCategory ?salePrice ?suggestedRetailPrice ?productImage ?productThumbNail ?productUrl (GROUP_CONCAT(DISTINCT ?productdescription) AS ?productdescriptions) ?productBrand ?productId ?productSku ?productModel ?productMPN ?productManufacturer ?productGtin13 ?productGtin14 ?productGtin8 ?productAvailable ?productUnAvailable ?productUsedCondition ?productNewCondition ?productColor ?productAggregateRatingValue ?productReviewCount
WHERE
{
?p2 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> d:Product .
?p2 <http://data-vocabulary.org/Product/offerDetails> ?schOffer .
?schOffer <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> d:Offer .
?schOffer <http://data-vocabulary.org/Offer/price> ?salePrice
OPTIONAL
{ ?schOffer <http://data-vocabulary.org/Offer/name> ?offerName }
OPTIONAL
{ ?p2 <http://data-vocabulary.org/Product/name> ?productName }
OPTIONAL
{ ?schOffer <http://data-vocabulary.org/Offer/image> ?offerImage }
OPTIONAL
{ ?p2 <http://data-vocabulary.org/Product/url> ?productUrl }
OPTIONAL
{ ?p2 <http://data-vocabulary.org/Product/image> ?productImage }
OPTIONAL
{ ?schOffer <http://data-vocabulary.org/Offer/category> ?productCategory }
OPTIONAL
{ ?p2 <http://data-vocabulary.org/Product/description> ?productdescription }
OPTIONAL
{ ?p2 <http://data-vocabulary.org/Product/brand> ?pBrandNode
OPTIONAL
{ ?pBrandNode <http://data-vocabulary.org/Brand/name> ?brandNodeName }
OPTIONAL
{ ?pBrandNode <http://data-vocabulary.org/Organization/name> ?brandNodeName }
BIND(str(coalesce(?brandNodeName, ?pBrandNode)) AS ?productBrand)
}
OPTIONAL
{ ?p2 <http://data-vocabulary.org/Product/manufacturer> ?pManufacturerNode
OPTIONAL
{ ?pManufacturerNode <http://data-vocabulary.org/manufacturer/name> ?manufacturerNodeName }
BIND(str(coalesce(?manufacturerNodeName, ?pManufacturerNode)) AS ?productManufacturer)
}
OPTIONAL
{ ?p2 <http://schema.org/Product/aggregateRating> ?pAggregateRating
OPTIONAL
{ ?pAggregateRating <http://schema.org/AggregateRating/ratingValue> ?productAggregateRatingValue }
OPTIONAL
{ ?pAggregateRating <http://schema.org/AggregateRating/reviewCount> ?productReviewCount }
}
OPTIONAL
{ ?p2 <http://data-vocabulary.org/Product/productID> ?productId }
OPTIONAL
{ ?p2 <http://data-vocabulary.org/Product/sku> ?productSku }
OPTIONAL
{ ?p2 <http://data-vocabulary.org/Product/itemCondition> ?productNewCondition .
?productNewCondition <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> s:NewCondition
}
OPTIONAL
{ ?p2 <http://data-vocabulary.org/Product/itemCondition> ?productUsedCondition .
?productUsedCondition <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> s:UsedCondition
}
OPTIONAL
{ ?p2 <http://data-vocabulary.org/Product/gtin13> ?productGtin13 }
OPTIONAL
{ ?p2 <http://data-vocabulary.org/Product/gtin14> ?productGtin14 }
OPTIONAL
{ ?p2 <http://data-vocabulary.org/Product/gtin8> ?productGtin8 }
OPTIONAL
{ ?schOffer <http://data-vocabulary.org/Offer/availability> ?productAvailable
FILTER ( ?productAvailable = s:InStock )
}
OPTIONAL
{ ?schOffer <http://data-vocabulary.org/Offer/availability> ?productUnAvailable
FILTER ( ?productUnAvailable = s:OutOfStock )
}
OPTIONAL
{ ?p2 <http://data-vocabulary.org/Product/model> ?productModel }
OPTIONAL
{ ?p2 <http://data-vocabulary.org/Product/mpn> ?productMPN }
OPTIONAL
{ ?p2 <http://data-vocabulary.org/Product/color> ?productColor }
}
GROUP BY ?productName ?offerImage ?offerName ?productCategory ?salePrice ?suggestedRetailPrice ?productImage ?productThumbNail ?productUrl ?productBrand ?productId ?productSku ?productModel ?productMPN ?productManufacturer ?productGtin13 ?productGtin14 ?productGtin8 ?productAvailable ?productUnAvailable ?productUsedCondition ?productNewCondition ?productColor ?productAggregateRatingValue ?productReviewCount
It is saying that the pattern you have more than one item in a group. Try running without the count and group by and look at the results where you get 2.

UNION operator in SPARQL updates

I have two SPARQL updates.First one:
INSERT
{ GRAPH <[http://example/bookStore2]> { ?book ?p ?v } }
WHERE
{ GRAPH <[http://example/bookStore]>
{ ?book dc:date ?date .
FILTER ( ?date > "1970-01-01T00:00:00-02:00"^^xsd:dateTime )
?book ?p ?v
} }
Second:
INSERT
{ GRAPH <[http://example/bookStore2]> { ?book ?p ?v } }
WHERE
{ GRAPH <[http://example/bookStore3]>
{ ?book dc:date ?date .
FILTER ( ?date > "1980-01-01T00:00:00-02:00"^^xsd:dateTime )
?book ?p ?v
} }
Can i combine them with the UNION operator? And if yes, is it an equivalent result? Is it possible to use UNION in SPARQL updates such as in "Select"?
AndyS's answer is correct; you can combine them, and the description of UNION is found in section 7 Matching Alternatives of the SPARQL specification. The combined query would be:
INSERT {
GRAPH <[http://example/bookStore2]> { ?book ?p ?v }
}
WHERE{
{
GRAPH <[http://example/bookStore]> {
?book dc:date ?date .
FILTER ( ?date > "1970-01-01T00:00:00-02:00"^^xsd:dateTime )
?book ?p ?v
}
}
UNION
{
GRAPH <[http://example/bookStore3]> {
?book dc:date ?date .
FILTER ( ?date > "1980-01-01T00:00:00-02:00"^^xsd:dateTime )
?book ?p ?v
}
}
}
In this particular case where the patterns are so similar, you could also just abstract out the differing parts with VALUES:
INSERT {
GRAPH <[http://example/bookStore2]> { ?book ?p ?v }
}
WHERE{
values (?graph ?startDate) {
(<[http://example/bookStore]> "1970-01-01T00:00:00-02:00"^^xsd:dateTime)
(<[http://example/bookStore3]> "1980-01-01T00:00:00-02:00"^^xsd:dateTime)
}
GRAPH ?graph {
?book dc:date ?date .
FILTER ( ?date > ?startDate )
?book ?p ?v
}
}
The WHERE clause is the same as SPARQL Query - you can use UNION.