SPARQL entity lists inside select queries - sparql

In the following DBpedia query, is there a way to consolidate the UNIONs into a single pattern?
PREFIX prop: <http://resedia.org/ontology/>
PREFIX res: <http://resedia.org/resource/>
SELECT DISTINCT ?language ?label
WHERE {
{res:Spain prop:language ?language}
UNION
{res:France prop:language ?language}
UNION
{res:Italy prop:language ?language}
?language rdfs:label ?label .
FILTER langMatches(lang(?label), "en")
}
The SPARQL spec mentions something about RDF collections but I don't really understand what it's describing. It seemed like the following syntax should work, but it didn't.
PREFIX prop: <http://resedia.org/ontology/>
PREFIX res: <http://resedia.org/resource/>
SELECT DISTINCT ?language ?label
WHERE {
(res:Spain res:France res:Italy) prop:language ?language
?language rdfs:label ?label .
FILTER langMatches(lang(?label), "en")
}
Is there a way to define a list (or "multiset", or "bag") of URIs like this inside a SELECT query?

In SPARQL 1.1 you can do
SELECT DISTINCT ?language ?label
WHERE {
?country prop:language ?language .
?language rdfs:label ?label .
VALUES ?country { res:Spain res:France res:Italy }
FILTER langMatches(lang(?label), "en")
}

Simple answer: no.
(res:Spain res:France res:Italy) prop:language ?language
means 'match where a list containing Spain, France and Italy has a language', i.e. the list itself has a language.
You could do:
?country prop:language ?language . ?language rdfs:label ?label .
FILTER ( ?country == res:Spain || ?country == res:France || ?country == res:Italy )
which is shorter, but may be slower.
(I had a feeling SPARQL 1.1 had an 'IN' feature, but I don't see it in the drafts)

Related

GraphDB - Federated Query

I would like to know how to perform a federated search on GraphDB. For example, to insert the code below in GraphDB, how should I do it? The idea is to add the content below to my local GraphDB.
#Locations of air accidents in wikidata - https://query.wikidata.org/
SELECT ?label ?coord ?place
WHERE
{
?subj wdt:P31 wd:Q744913 .
?subj wdt:P625 ?coord .
?subj rdfs:label ?label
filter (lang(?label) = "en")
}
Posting #UninformedUser's comment as an answer for better readability.
SPARQL 1.1 offers the SERVICE feature, described here. You can use it to perform federated queries against Wikidata directly inside of GraphDB.
SELECT * WHERE {
SERVICE <https://query.wikidata.org/sparql> {
?subj wdt:P31 wd:Q744913 ;
wdt:P625 ?coord ;
rdfs:label ?label
FILTER (lang(?label) = "en")
}
}
To insert the data to your local GraphDB, use something like this:
INSERT {
?subj wdt:P31 wd:Q744913 ;
wdt:P625 ?coord ;
rdfs:label ?label
} WHERE {
SERVICE <https://query.wikidata.org/sparql> {
?subj wdt:P31 wd:Q744913 ;
wdt:P625 ?coord ;
rdfs:label ?label
FILTER (lang(?label) = "en")
}
}
However, you'd probably want to unpack the coordinates and use some ontology that's easier to understand, eg:
PREFIX wgs: <http://www.w3.org/2003/01/geo/wgs84_pos#> # see http://prefix.cc/wgs.sparql
INSERT {
?subj a :AirAccident;
wgs:lat ?lat; wgs:long ?long;
rdfs:label ?label
} WHERE {
SERVICE <https://query.wikidata.org/sparql> {
?subj wdt:P31 wd:Q744913 ;
p:P625/psv:P625 [wikibase:geoLatitude ?lat; wikibase:geoLongitude ?long];
rdfs:label ?label
FILTER (lang(?label) = "en")
}
}
For the p:P625, psv:P625, wikibase:geoLatitude stuff, see https://github.com/nichtich/wdq#wikidata-ontology (and if you install it, wdq help ontology gives this color-coded)

SPARQL query for specific information

I am struggling a lot to create some SPARQL queries. I need 3 specific things, and this is what i have so far:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dbpedia: <http://dbpedia.org/resource/>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbp: <http://dbpedia.org/property/>
select distinct ?title ?author ?country ?genre ?language
where {
?s rdf:type dbo:Book;
dbp:title ?title;
dbp:author ?author;
dbp:country ?country;
dbp:genre ?genre;
dbp:language ?language.
}
This query will bring me a list of all books. What i really need is the ability to add some filters to this code. There are 3 things i want to filter by:
specific title name (e.g., search for title with "harry potter")
specific author name (e.g., search for author with "J. K. Rowling")
specific genre (e.g., search for genre with "adventure")
I've been struggling with this for too long and i simply cannot define these 3 queries. I am trying to implement a function that will execute a SPARQL statement using parameters passed by an user form. I found a few examples here and in the web but i just cannot build these 3 specific queries.
As noted, not every book has every property, and some of your properties may not exist at all. For instance, I changed dbp:genre to dbo:literaryGenre, based on the description of Harry Potter and the Goblet of Fire. See query form, and results.
SELECT *
WHERE
{ ?s rdf:type dbo:Book .
?s rdfs:label ?bookLabel .
FILTER(LANGMATCHES(LANG(?bookLabel), 'en'))
?s dbo:author ?author .
?author rdfs:label ?authorLabel .
FILTER(LANGMATCHES(LANG(?authorLabel), 'en'))
?authorLabel bif:contains "Rowling"
OPTIONAL { ?s dbp:country ?country .
?country rdfs:label ?countryLabel .
FILTER(LANGMATCHES(LANG(?countryLabel), 'en')) }
OPTIONAL { ?s dbo:literaryGenre ?genre .
?genre rdfs:label ?genreLabel .
FILTER(LANGMATCHES(LANG(?genreLabel), 'en')) }
OPTIONAL { ?s dbp:language ?language .
?language rdfs:label ?languageLabel .
FILTER(LANGMATCHES(LANG(?languageLabel), 'en')) }
}

Free text search in sparql when you have multiword and scaping character

I am wondering how I can use in sparql query when I have a word like :
Robert J. O'Neill
I am looking for the resource that have the multiword unit with quota or unicode character in the Label property.
SELECT DISTINCT ?resource ?abstract
WHERE {?resource rdfs:label ?s.
?s <bif:contains> "'Robert J. O'Neill'"
?resource dbo:abstract ?abstract
}
'''
Here is the query that will return all the elements that have "Robert J. O'Neill" as label.
SELECT DISTINCT ?s WHERE
{
?s rdfs:label ?label .
FILTER(regex(?label, "Robert J. O'Neill", "i"))
}
If you are sure that you need a specific string matching. This is faster :
SELECT DISTINCT ?s WHERE
{
?s rdfs:label ?label .
?label bif:contains "Robert J. O'Neill"
}
But be aware that, Virtuoso for example doesnt support such a query because of the spaces in the string. So an alternative is to avoid it as :
SELECT DISTINCT * WHERE
{
?s rdfs:label ?label .
?label bif:contains "Robert" .
FILTER (CONTAINS(?label, " J. O'Neill"))
}
I found following code faster that the regex:
SELECT ?s WHERE { ?s rdfs:label ?o FILTER ( bif:contains ( ?o, '"Robert" AND "J." AND "Neill"' ) ) }

Aggregate properties

I'm developing my own Fuseki endpoint from some DBpedia data.
I'm in doubt on how to aggregate properties related to a single resource.
SELECT ?name ?website ?abstract ?genre ?image
WHERE{
VALUES ?s {<http://dbpedia.org/resource/Attack_Attack!>}
?s foaf:name ?name ;
dbo:abstract ?abstract .
OPTIONAL { ?s dbo:genre ?genre } .
OPTIONAL { ?s dbp:website ?website } .
OPTIONAL { ?s dbo:image ?image } .
FILTER LANGMATCHES(LANG(?abstract ), "en")
}
SPARQL endpoint: http://dbpedia.org/sparql/
This query returns 2 matching results. They are different just for the dbo:genre value. There is a way I can query the knowledge base and retrieving a single result with a list of genres?
#chrisis's query works well on the DBpedia SPARQL Endpoint, which is based on Virtuoso.
However, if you are using Jena Fuseki, you should use more conformant syntax:
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbp: <http://dbpedia.org/property/>
SELECT
?name
(SAMPLE(?website) AS ?sample_website)
(SAMPLE(?abstract) AS ?sample_abstract)
(SAMPLE(?image) AS ?sample_image)
(GROUP_CONCAT(?genre; separator=', ') AS ?genres)
WHERE {
VALUES (?s) {(<http://dbpedia.org/resource/Attack_Attack!>)}
?s foaf:name ?name ;
dbo:abstract ?abstract .
OPTIONAL { ?s dbo:genre ?genre } .
OPTIONAL { ?s dbp:website ?website } .
OPTIONAL { ?s dbo:image ?image} .
FILTER LANGMATCHES(LANG(?abstract ), "en")
} GROUP BY ?name
The differences from the #chrisis's query are:
Since GROUP_CONCAT is an aggregation function, it might be used with GROUP BY only;
Since GROUP BY is used, all non-grouping variables should be aggregated (e.g. via SAMPLE);
GROUP_CONCAT syntax is slightly different.
In Fuseki, these AS in the projection are in fact superfluous: see this question and comments.
Yes, the GROUP_CONCAT() function is what you want.
SELECT ?name ?website ?abstract (GROUP_CONCAT(?genre,',') AS ?genres) ?image
WHERE{
<http://dbpedia.org/resource/Attack_Attack!> a dbo:Band ;
foaf:name ?name;
dbo:abstract ?abstract .
OPTIONAL{ <http://dbpedia.org/resource/Attack_Attack!> dbo:genre ?genre } .
OPTIONAL{ <http://dbpedia.org/resource/Attack_Attack!> dbp:website ?website} .
OPTIONAL{ <http://dbpedia.org/resource/Attack_Attack!> dbo:image ?image} .
FILTER LANGMATCHES(LANG(?abstract ), "en")
}

SPARQL Matching Literals with **ANY** Language Tags without run into timeout

I need to select the entity that have a "taxon rank (P105)" of "species (Q7432)" which have a label that match a literal string such as "Topinambur".
I'm testing the queries on https://query.wikidata.org;
this query goes fine and return the entity to me with satisfying response time:
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT * WHERE {
?entity rdfs:label "Topinambur"#de .
?entity wdt:P105 wd:Q7432.
}
LIMIT 100
The problem here is that my requisite is not to specify the language but the lexical forms of the labels in the underlying dataset ( wikidata) has language tags so i need a way to get Literal Equality for any language.
I tried some possible solution but I didn't find any query that didn't result in the following:
TIMEOUT message com.bigdata.bop.engine.QueryTimeoutException: Query deadline is expired
Here the list of what I tried (..and I always get TIMEOUT) :
1) based on this answer I tried:
SELECT * WHERE {
?entity rdfs:label ?label FILTER ( str( ?label ) = "Topinambur") .
?entity wdt:P105 wd:Q7432.
}
LIMIT 100
2) based on some other documentation I tried:
SELECT * WHERE {
?entity wdt:P105 wd:Q7432.
?entity rdfs:label ?label FILTER regex(?label, "^Topinambur") .
}
LIMIT 100
3) and
SELECT * WHERE {
?entity wdt:P105 wd:Q7432.
?entity rdfs:label ?label .
FILTER langMatches( lang(?label), "*" )
FILTER (?label = "Topinambur")
}
LIMIT 100
What I'm looking for is a performant solution or some SPARQL syntax the doesn't end up to a TIMEOUT message.
PS: with reference to http://www.rfc-editor.org/rfc/bcp/bcp47.txt I don't understand if language ranges or ```wildcards`` could help in some way.
EDIT
I successfully tested (without falling timeout) a similar query in DbPedia by using virtuoso query editor at:
https://dbpedia.org/sparql
Default Data Set Name (Graph IRI):http://dbpedia.org
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dbo: <http://dbpedia.org/ontology/>
SELECT ?resource
WHERE {
?resource rdfs:label ?label . FILTER ( str( ?label ) = "Topinambur").
?resource rdf:type dbo:Species
}
LIMIT 100
I am still very interested in understanding the performance problem that I experience on Wikidata and what is the best syntax to use.
I solved similar problem - want to find entity with label string in any language. I recommend do not use FILTER, because it is too slow. Rather use UNION like this:
SELECT ?entity WHERE {
?entity wdt:P105 wd:Q7432.
{ ?entity rdfs:label "Topinambur"#de . }
UNION { ?entity rdfs:label "Topinambur"#en . }
UNION { ?entity rdfs:label "Topinambur"#fr . }
}
GROUP BY ?entity
LIMIT 100
Try it!
This solution is not perfect, because you have to enumater all languages, but is fast and reliable. List of all available wikidata language are here.
This answer proposes three options:
Be more specific.
The ?entity wdt:P171+ wd:Q25314 pattern seems to be sufficiently selective in your case.
Wait until they implement full-text search.
Use Quarry (example query).
Another option is to use Virtuoso full-text search capabilities on wikidata.dbpedia.org:
SELECT ?s WHERE {
?resource rdfs:label ?label .
?label bif:contains "'topinambur'" .
BIND ( IRI ( REPLACE ( STR(?resource),
"http://wikidata.dbpedia.org/resource",
"http://www.wikidata.org/entity"
)
) AS ?s
)
}
Try it!
It seems that even the query below sometime works on wikidata.dbpedia.org without falling into timeout:
PREFIX dbo: <http://dbpedia.org/ontology/>
SELECT ?resource WHERE {
?resource rdfs:label ?label .
FILTER ( STR(?label) = "Topinambur" ) .
}
Try it!
Two hours ago I've removed this statement on Wikidata:
wd:Q161378 rdfs:label "topinambur"#ru .
I'm not a botanist, but 'topinambur' is definitely not a word in Russian.
Working further on from #quick's answer, and showing it for lexemes rather than labels. First identifying relevant language codes:
SELECT (GROUP_CONCAT(?mword; separator=" ") AS ?mwords) {
BIND(1 AS ?dummy)
VALUES ?word { "topinambur" }
{
SELECT (COUNT(?lexeme) AS ?count) ?language_code {
?lexeme dct:language / wdt:P424 ?language_code .
}
GROUP BY ?language_code
HAVING (?count > 100)
ORDER BY DESC(?count)
}
BIND(CONCAT('"', ?word, '"#', ?language_code) AS ?mword)
}
GROUP BY ?dummy
Try it!
Followed by the verbose query
SELECT (COUNT(?lexeme) AS ?count) ?language (GROUP_CONCAT(?word; separator=" ") AS ?words) {
VALUES ?word { "topinambur"#eo "topinambur"#ko "topinambur"#bfi "topinambur"#nl "topinambur"#uk "topinambur"#cy "topinambur"#pt "topinambur"#zh "topinambur"#br "topinambur"#bg "topinambur"#ms "topinambur"#tg "topinambur"#se "topinambur"#ta "topinambur"#non "topinambur"#it "topinambur"#zh-min-nan "topinambur"#nan "topinambur"#fi "topinambur"#jbo "topinambur"#ml "topinambur"#ja "topinambur"#ku "topinambur"#bn "topinambur"#ar "topinambur"#nb "topinambur"#es "topinambur"#pl "topinambur"#nn "topinambur"#sk "topinambur"#da "topinambur"#de "topinambur"#cs "topinambur"#fr "topinambur"#sv "topinambur"#eu "topinambur"#he "topinambur"#la "topinambur"#en "topinambur"#ru }
?lexeme dct:language ?language ;
ontolex:lexicalForm / ontolex:representation ?word .
}
GROUP BY ?language
Try it!
For querying on labels, do something similar to:
SELECT (COUNT(?item) AS ?count) ?language (GROUP_CONCAT(?word; separator=" ") AS ?words) {
VALUES ?word { "topinambur"#eo "topinambur"#ko "topinambur"#bfi "topinambur"#nl "topinambur"#uk "topinambur"#cy "topinambur"#pt "topinambur"#zh "topinambur"#br "topinambur"#bg "topinambur"#ms "topinambur"#tg "topinambur"#se "topinambur"#ta "topinambur"#non "topinambur"#it "topinambur"#zh-min-nan "topinambur"#nan "topinambur"#fi "topinambur"#jbo "topinambur"#ml "topinambur"#ja "topinambur"#ku "topinambur"#bn "topinambur"#ar "topinambur"#nb "topinambur"#es "topinambur"#pl "topinambur"#nn "topinambur"#sk "topinambur"#da "topinambur"#de "topinambur"#cs "topinambur"#fr "topinambur"#sv "topinambur"#eu "topinambur"#he "topinambur"#la "topinambur"#en "topinambur"#ru }
?item rdfs:label ?word ;
}
GROUP BY ?language