sparql exclude multiple type hierarchy - sparql

In dbpedia I select some pages with label starting 'A'. Here I'm using additional filter by subject to narrow the set. In original version there are another conditions (result set is much bigger)
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX purl: <http://purl.org/dc/terms/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX : <http://dbpedia.org/page/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX dbr: <http://dbpedia.org/resource/>
SELECT DISTINCT
?pageType
WHERE
{
{
?page rdfs:label ?label .
?page a ?pageType .
?page <http://purl.org/dc/terms/subject> <http://dbpedia.org/resource/Category:Banking> .
}
FILTER ( strstarts(str(?pageType), 'http://dbpedia.org/ontology') )
}
LIMIT 1000
sparql results
Here I select only page types to be clear with rest of the question.
This is the whole set. Now I want to exclude some pages. Exclude all agents (persons, organization etc):
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX purl: <http://purl.org/dc/terms/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX : <http://dbpedia.org/page/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX dbr: <http://dbpedia.org/resource/>
SELECT DISTINCT
?pageType
WHERE
{
{
?page rdfs:label ?label .
?page a ?pageType .
?page <http://purl.org/dc/terms/subject> <http://dbpedia.org/resource/Category:Banking> .
MINUS { ?page a dbo:Agent }
}
FILTER ( strstarts(str(?pageType), 'http://dbpedia.org/ontology') )
}
LIMIT 1000
The result.
Ok. Then I want to exclude more types, for example Written_Work. I tried different approaches, but unabled to find the correct one.
This returns nothing:
WHERE
{
{
?page rdfs:label ?label .
?page a ?pageType .
?page <http://purl.org/dc/terms/subject> <http://dbpedia.org/resource/Category:Banking> .
MINUS { ?page a dbo:Agent }
MINUS { ?page a dbo:WrittenWork }
}
This is like no filter is set:
WHERE
{
{
?page rdfs:label ?label .
?page a ?pageType .
?page <http://purl.org/dc/terms/subject> <http://dbpedia.org/resource/Category:Banking> .
MINUS { ?page a dbo:Agent, dbo:WrittenWork }
}
The question is:
what way should I go to exclude pages of certain types (direct and superclass)?

It look's like this is working answer (how to exclude multiple of types)
{
?page purl:subject ?id .
?page a ?pageType .
FILTER NOT EXISTS {
?page a/rdfs:subClassOf* ?skipClasses .
FILTER(?skipClasses in (dbo:Agent, dbo:Place, dbo:Work))
}
}
In this example all dbo:Agents, db:Places, dbo:Works will be filtered out.

Related

SPARQL query for specific information

I am struggling a lot to create some SPARQL queries. I need 3 specific things, and this is what i have so far:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dbpedia: <http://dbpedia.org/resource/>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbp: <http://dbpedia.org/property/>
select distinct ?title ?author ?country ?genre ?language
where {
?s rdf:type dbo:Book;
dbp:title ?title;
dbp:author ?author;
dbp:country ?country;
dbp:genre ?genre;
dbp:language ?language.
}
This query will bring me a list of all books. What i really need is the ability to add some filters to this code. There are 3 things i want to filter by:
specific title name (e.g., search for title with "harry potter")
specific author name (e.g., search for author with "J. K. Rowling")
specific genre (e.g., search for genre with "adventure")
I've been struggling with this for too long and i simply cannot define these 3 queries. I am trying to implement a function that will execute a SPARQL statement using parameters passed by an user form. I found a few examples here and in the web but i just cannot build these 3 specific queries.
As noted, not every book has every property, and some of your properties may not exist at all. For instance, I changed dbp:genre to dbo:literaryGenre, based on the description of Harry Potter and the Goblet of Fire. See query form, and results.
SELECT *
WHERE
{ ?s rdf:type dbo:Book .
?s rdfs:label ?bookLabel .
FILTER(LANGMATCHES(LANG(?bookLabel), 'en'))
?s dbo:author ?author .
?author rdfs:label ?authorLabel .
FILTER(LANGMATCHES(LANG(?authorLabel), 'en'))
?authorLabel bif:contains "Rowling"
OPTIONAL { ?s dbp:country ?country .
?country rdfs:label ?countryLabel .
FILTER(LANGMATCHES(LANG(?countryLabel), 'en')) }
OPTIONAL { ?s dbo:literaryGenre ?genre .
?genre rdfs:label ?genreLabel .
FILTER(LANGMATCHES(LANG(?genreLabel), 'en')) }
OPTIONAL { ?s dbp:language ?language .
?language rdfs:label ?languageLabel .
FILTER(LANGMATCHES(LANG(?languageLabel), 'en')) }
}

Duplicated results from Wikidata

I created the following SPARQL query to Wikidata. And the result of this query are records related to states in Germany. But as you can see, results are occurring four times in a row (you can test it here: https://query.wikidata.org/). I supposed that there is a problem with geo coordinates and languages but I can't resolve it anyway. What is wrong with this query and how can I fix it to receive a result without repetition?
PREFIX p: <http://www.wikidata.org/prop/>
PREFIX schema: <http://schema.org/>
PREFIX psv: <http://www.wikidata.org/prop/statement/value/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX wd: <http://www.wikidata.org/entity/>
SELECT DISTINCT ?subject ?featureCode ?countryCode ?name ?latitude ?longitude ?description ?iso31662
WHERE
{ ?subject wdt:P31 wd:Q1221156 ;
rdfs:label ?name ;
wdt:P17 ?countryClass .
?countryClass
wdt:P297 ?countryCode .
?subject wdt:P31/(wdt:P279)* ?adminArea .
?adminArea wdt:P2452 "A.ADM1" ;
wdt:P2452 ?featureCode .
?subject wdt:P300 ?iso31662
OPTIONAL
{ ?subject schema:description ?description
FILTER ( lang(?description) = "en" )
?subject p:P625 ?coordinate .
?coordinate psv:P625 ?coordinateNode .
?coordinateNode
wikibase:geoLatitude ?latitude ;
wikibase:geoLongitude ?longitude
}
FILTER ( lang(?name) = "en" )
FILTER EXISTS { ?subject wdt:P300 ?iso31662 }
}
ORDER BY lcase(?name)
OFFSET 0
LIMIT 200
In short, "9.0411111111111"^^xsd:double and "9.0411111111111"^^xsd:decimal are distinct, though they might be equal in some sense.
Check this:
SELECT DISTINCT ?subject ?featureCode ?countryCode ?name ?description ?iso31662
(datatype(?latitude) AS ?lat)
(datatype(?longitude) AS ?long)
and this:
SELECT DISTINCT ?subject ?featureCode ?countryCode ?name ?description ?iso31662
(xsd:decimal(?latitude) AS ?lat)
(xsd:decimal(?longitude) AS ?long)

how can i get list of book from Wikibooks with SPARQL query

how can i get list of book from Wikibooks with SPARQL query fo example :
PREFIX dbo:http://dbpedia.org/ontology/
PREFIX dba:http://dbpedia.org/ontology/
SELECT ?author ?name ?label ?text ?title ?isbn ?publisher ?literaryGenre ?pages WHERE
{?book a dbo:Book.
?book dbo:author ?author.
?book dbo:numberOfPages ?pages.
?book dbp:title ?title.
?book dba:isbn ?isbn.
?book dba:publisher ?publisher.
FILTER regex(?title , "java") .
}
I'm wondering whether you know that Wikibooks is not Wikipedia and DBpedia is based on Wikipedia?!
And then, why do you have two prefixes dbo and dba for the same namespace http://dbpedia.org/ontology/ ? I really suggest to understand what you're doing and what the query does instead of copy and paste from some other sources. SPARQL and RDF tutorials might help, and also the official documentation is useful.
Next issue, you SELECT variables ?name, ?label, ?text and ?literaryGenre which are not bound in a triple pattern in the WHERE part. It's also not clear what you expect to get for ?text. The whole text of the book?! For sure, this won't exist, think about copyrights.
And what would be the difference between ?name and ?title? I don't think that dbp:title is the appropriate property here, see
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbp: <http://dbpedia.org/property/>
SELECT count(*) WHERE {
?book a dbo:Book ;
dbp:title ?title.
}
which returns 19 only.
My suggestion:
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT * WHERE {
?book a dbo:Book .
?book dbo:author ?author .
OPTIONAL { ?book dbo:numberOfPages ?pages }
OPTIONAL { ?book dbo:isbn ?isbn }
OPTIONAL { ?book dbo:publisher ?publisher }
# get the English title
?book rdfs:label ?name.
FILTER(LANGMATCHES(LANG(?name), 'en'))
# get an English description, but not the text
?book rdfs:comment ?text .
FILTER(LANGMATCHES(LANG(?text), 'en'))
# filter for books whose title contains "java"
FILTER regex(str(?name) , "java", "i") .
}
More efficient using the Virtuoso fulltext index predicate bif:contains:
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT * WHERE {
?book a dbo:Book .
?book dbo:author ?author .
OPTIONAL { ?book dbo:numberOfPages ?pages }
OPTIONAL { ?book dbo:isbn ?isbn }
OPTIONAL { ?book dbo:publisher ?publisher }
# get the English title
?book rdfs:label ?name.
FILTER(LANGMATCHES(LANG(?name), 'en'))
# get an English description, but not the text
?book rdfs:comment ?text .
FILTER(LANGMATCHES(LANG(?text), 'en'))
# filter for books whose title contains "java"
?name bif:contains '"java"'
}
As a book might have multiple authors resp. publisher you might get duplicate rows, here GROUP_BY in combination with GROUP_CONCAT is the way to go (grouped by book):
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?book (group_concat(DISTINCT ?author; separator = ", ") as ?authors) (group_concat(DISTINCT ?publisher; separator = ", ") as ?publishers) (sample(?pages) as ?numPages) (sample(?isbn_tmp) as ?isbn) WHERE {
?book a dbo:Book .
?book dbo:author ?author .
OPTIONAL { ?book dbo:numberOfPages ?pages }
OPTIONAL { ?book dbo:isbn ?isbn_tmp }
OPTIONAL { ?book dbo:publisher ?publisher }
# get the English title
?book rdfs:label ?name.
FILTER(LANGMATCHES(LANG(?name), 'en'))
# get an English description, but not the text
?book rdfs:comment ?text .
FILTER(LANGMATCHES(LANG(?text), 'en'))
# filter for books whose title contains "java"
?name bif:contains '"java"'
}
GROUP BY ?book

Filtering DBpedia disambiguation page

I have a SPARQL Query, and I want to eliminate all disambigution resources. How can I do this? This is my query:
prefix rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix foaf: <http://xmlns.com/foaf/0.1/>
select distinct ?Nom ?resource ?url where {
?resource rdfs:label ?Nom.
?resource foaf:isPrimaryTopicOf ?url.
FILTER (langMatches( lang(?Nom), "EN" )).
?Nom <bif:contains> "Apple".
}
You can add the following prefix and filter to your query:
prefix dbo: <http://dbpedia.org/ontology/>
filter not exists {
?resource dbo:wikiPageRedirects*/dbo:wikiPageDisambiguates ?dis
}
This says to exclude resources and resources that redirect to a resources that disambiguate some articles. That gives you a query like this:
prefix rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix foaf: <http://xmlns.com/foaf/0.1/>
prefix dbo: <http://dbpedia.org/ontology/>
select distinct ?Nom ?resource ?url where {
?resource rdfs:label ?Nom.
?resource foaf:isPrimaryTopicOf ?url.
FILTER (langMatches( lang(?Nom), "EN" )).
?Nom <bif:contains> "Apple".
filter not exists {
?resource dbo:wikiPageRedirects*/dbo:wikiPageDisambiguates ?dis
}
}
SPARQL results
Now, even though that removes all the disambiguation pages, you may still have results that include "disambiguation" in the title. For instance, one of the results is:
The Little Apple (disambiguation)"#en
http://dbpedia.org/resource/The_Little_Apple_(disambiguation)
Even though that has "disambiguation" in the name, it's not a disambiguation page. It doesn't have any values for dbo:wikiPageDisambiguates. it does redirect to another page, though. You may want to filter out things that redirect to something else, too. You can modify the filter though:
filter not exists {
?resource dbo:wikiPageRedirects|dbo:wikiPageDisambiguates ?dis
}
That says to filter out any resource that either redirects to something, or that disambiguates something. This is actually a simpler filter, really. This makes your query:
prefix rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix foaf: <http://xmlns.com/foaf/0.1/>
prefix dbo: <http://dbpedia.org/ontology/>
select distinct ?Nom ?resource ?url where {
?resource rdfs:label ?Nom.
?resource foaf:isPrimaryTopicOf ?url.
FILTER (langMatches( lang(?Nom), "EN" )).
?Nom <bif:contains> "Apple".
filter not exists {
?resource dbo:wikiPageRedirects|dbo:wikiPageDisambiguates ?dis
}
}
SPARQL results

Sparql of DbPedia based upon name not subject

I am trying to query dbpedia to get some people data and I don't have subjects just names of the people I want to query and their birth/death dates.
I am trying to do a query along these lines. I want the name, birth date, death date and thumbnail of everyone with the surname Presley. What I then intend to do is loop through the results returned and find the best match for Elvis Presley 1935-1977 which is the data I have.
PREFIX dbo: <http://dbpedia.org/ontology/>
SELECT ?Name ?thumbnail ?birthDate ?deathDate WHERE {
{
dbo:name ?Name ;
dbo:birthDate ?birthDate ;
dbo:birthDate ?deathDate ;
dbo:thumbnail ?thumbnail ;
FILTER contains(?Name#en, "Presley")
}
What is the best way to construct my sparql query?
UPDATE:
I have put together this query which seems to work to some extent but I don't entirely understand it, and I can't figure out the contains, but it does at least run and return results.
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX dbo: <http://dbpedia.org/ontology/>
SELECT ?subject ?thumbnail ?birthdate ?deathdate WHERE {
{
?subject rdfs:label "Elvis Presley"#en ;
dbo:thumbnail ?thumbnail ;
dbo:birthDate ?birthdate ;
dbo:deathDate ?deathdate ;
a owl:Thing .
}
UNION
{
?altName rdfs:label "Elvis Presley"#en ;
dbo:thumbnail ?thumbnail ;
dbo:birthDate ?birthdate ;
dbo:deathDate ?deathdate ;
dbo:wikiPageRedirects ?s .
}
}
Some entities might not have all of that information, so it's better to use optional. You can use foaf:surname to check for surname directly.
select * where {
?s foaf:surname "Presley"#en
optional { ?s dbo:name ?name }
optional { ?s dbo:birthDate ?birth }
optional { ?s dbo:deathDate ?death }
optional { ?s dbo:thumbnail ?thumb }
}