How to exclude nodes from path? - sparql

I want to get all mathematicians from DBpedia, so I wrote this query for DBpedia's SPARQL service:
SELECT DISTINCT ?person
{
?person dct:subject ?category.
?category skos:broader* dbc:Mathematicians.
}
The problem with this is that the category Mathematicians is polluted, due to categories like dbc:Euclid, which then includes all of Euclidean geometry. I believe it's categories like these which cause the query to fail:
Virtuoso 42000 Error TN...: Exceeded 1000000000 bytes in transitive temp memory. use t_distinct, t_max or more T_MAX_memory options to limit the search or increase the pool
A lot of the problematic categories are in dbc:Wikipedia_categories_named_after_mathematicians.
Is there some way to ignore these categories in the skos:broader* path that would make the error go away?

You can list the categories that you don't want to include by filtering them out:
SELECT DISTINCT ?person
{
?person dct:subject ?category.
?category skos:broader* dbc:Mathematicians.
FILTER (?category NOT IN (dbc:Euclid))
}
But that won't remove the error because Virtuoso still needs to traverse the skos:broader hierarchy, exhausting 'transitive heap memory'. Other approaches include selecting specific categories or traversing part of the hierarchy.
The specific category could use UNION statements, but the VALUES shortcut is a simpler syntax:
SELECT DISTINCT ?person
{
VALUES ?category {dbc:Mathematicians dbc:Mental_calculators dbc:Lists_of_mathematicians}
?person dct:subject ?category.
}
For querying part of the hierarchy, you can use some property path expressions. This one will get parents and grandparents:
SELECT DISTINCT ?person
{
?person dct:subject ?category.
?category skos:broader | (skos:broader/skos:broader) dbc:Mathematicians.
# filter as desired - FILTER (?category NOT IN (dbc:Euclid))
}

Related

i want to get the names of similar types using sparql queries from dbpedia

I need to find the names of similar types from DBpedia so I'm trying to figure out a query which can return me the names of entities which have same subject type in its dct:subject (example I want to find similar types of white house so i want to write a query for same . I'm considering the dct:subject to find them ). If there is any other approach please mention it
Previously I tried it for rdf:type but the result are not so good and some time it shows time out
I have done my problem by the query mentioned below and now i want to consider dct:subject instead of rdf:type
select distinct ?label ?resource count(distinct ?type) as ?score where {
values ?type { dbo:Thing dbo:Organization yago:WikicatIslam-relatedControversies yago:WikicatIslamistGroups yago:WikicatRussianFederalSecurityServiceDesignatedTerroristOrganizations yago:Abstraction100002137 yago:Act100030358 yago:Cabal108241798 yago:Group100031264 yago:Movement108464601 yago:PoliticalMovement108472335
}
?resource rdfs:label ?label ;
foaf:name ?name ;
a ?type .
FILTER (lang(?label) = 'en').
}
ORDER BY DESC(?score)

Property paths in Sparql

I am posing a Sparql query to Dbpedia endpoint with property paths:
select (COUNT(distinct ?s2) AS ?count) WHERE{
?s2 skos:broader{0,2} dbc:Countries_in_Europe
}
I want to pose the same query without property paths:
select (COUNT(distinct ?s2) AS ?count) (COUNT(distinct ?s1) AS ?count1) WHERE{
?s2 skos:broader dbc:Countries_in_Europe.
?s1 skos:broader ?s2.
}
I have two questions:
Is it possible to get ?s1+?s2 for the second query?
For the second query, I expect the sum of the count numbers +1 (dbc:Countries_in_Europe) should be the same with the first query. But they are not. What is wrong?
Thanks in advance.
You're using non-standard SPARQL, i.e. restricting the depth did not make it to the final version, see the W3C specs
I guess the first query is supposed to returns sub-categories of the given one up to a depth 2, right? Your second query doesn't do the same. You have to use a UNION of each distance, i.e. one for the direct sub-categories, and one for the other levels.
SELECT (COUNT(distinct ?s) AS ?count) WHERE {
{
?s skos:broader dbc:Countries_in_Europe
} UNION {
?s1 skos:broader dbc:Countries_in_Europe.
?s skos:broader ?s1
}
}
Note, in your first query you used {0,2} which means due to 0 distance the category dbc:Countries_in_Europe itself is also part of the result. If you need it, you should add +1 to the result of the second query.
Update
As per #JohuaTaylor's comment below, a more compact syntax would be
SELECT (COUNT(distinct ?s) AS ?count) WHERE {
?s skos:broader/skos:broader? dbc:Countries_in_Europe
}

DBpedia SPARQL to eliminate unwanted data

PREFIX category: <http://dbpedia.org/resource/Category:>
SELECT DISTINCT ?attractions
?location
WHERE
{ ?attractions dcterms:subject ?places
. ?places skos:broader ?border
. ?attractions dbpprop:location|dbpedia-owl:locatedInArea|dbpprop:locale ?location
. FILTER( ?border = category:Visitor_attractions_in_Delhi )
}
I have above query giving result of attraction location of Delhi. I need to make it generic for all places, and secondly I want to filter out unwanted data. I want only attraction places, e.g., I didn't want List of Monuments and SelectCityWalk like data in my output.

How to handle Wikipedia Named Entities that have the same Category name

I was trying to extract all US companies so I ran the following query
PREFIX cat: <http://dbpedia.org/resource/Category:>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
SELECT DISTINCT ?page ?subcat WHERE { ?subcat skos:broader* cat:Companies_of_the_United_States_by_industry .
?page dcterms:subject ?subcat .
?page rdfs:label ?pageName.
}
This is a snapshot of the results
Amgen and Pfizer are both companies as well as Category, so I end up collecting everything under Pfizer and Amgen (people, product). I found out that these entries belong to wikipedia category called Category:Wikipedia_categories_named_after_companies_of_the_United_States or Category:Wikipedia_categories_named_after_pharmaceutical_companies_of_the_United_States. So I tried to filter these categories so I did this
SELECT DISTINCT ?page ?subcat WHERE { ?subcat skos:broader* cat:Companies_of_the_United_States_by_industry .
?page dcterms:subject ?subcat .
?page rdfs:label ?pageName.
FILTER( !regex(?subcat,"Wikipedia_categories_named_after_pharmaceutical_companies_of_the_United_States")) }
But no luck, they are still there. Any idea how to avoid this problem?
The problem doesn't have anything to do with them having the same name. Wikipedia categories don't form a type hierarchy, so it doesn't make sense to treat them like one. The reason you see the results that you're seeing is that there's a category Pfizer, and that its broader values include the company listings, but is also the dcterms:subject of dbpedia:Alprazolam, dbpedia:Cetirizine, etc. It doesn't make sense as a type hierarchy, but it is fine for organizing article topics. If you only want companies back, just ask for things that are companies:
SELECT DISTINCT ?page ?subcat WHERE {
?subcat skos:broader* category:Companies_of_the_United_States_by_industry .
?page dcterms:subject ?subcat .
?page rdfs:label ?pageName.
?page a dbpedia-owl:Company
}
We can clean that up a bit, though. You're not using ?label, so we can remove it. We can use some of the shorter syntaxes to make things a little bit cleaner. We can also note that "Companies … by industry" has a skos:broader value "Companies of the United States" which makes the intent of the query a bit clearer.
select distinct ?company ?subcategory where {
?company dcterms:subject ?subcategory ;
a dbpedia-owl:Company .
?subcategory skos:broader* category:Companies_of_the_United_States .
}
limit 1000
SPARQL results
As a final note, the category hierarchy doesn't necessarily mean that each company has a single path to the top category. That is, you could get some company listed multiple times, e.g.:
company subcategory
------------------------------------
companyX Textile_Companies
companyX Companies_in_New_Hampshire
Unless you need the listing of subcategories, you might consider eliminating it from the query, in which case you can simply have (using property paths):
select distinct ?company where {
?company a dbpedia-owl:Company ;
dcterms:subject/skos:broader* category:Companies_of_the_United_States .
}
limit 1000
SPARQL results

retrieving most specific classes of instances

Is it possible to have a definition of an resource (from DBpedia) with a SPARQL query? I want to have something like the TBox and ABox that are shown in (Conceptual) Clustering methods for
the Semantic Web: issues and applications (slides 10–11). For example, for DBpedia resource Stephen King, I would like to have:
Stephen_King : Person &sqcap; Writer &sqcap; Male &sqcap; … (most specific classes)
You can use a query like the following to ask for the classes of which Stephen King is an instance which have no subclasses of which Stephen King is also an instance. This seems to align well with the idea of “most specific classes.” However, since (as far as I know) there's no reasoner attached to the DBpedia SPARQL endpoint, there may be subclass relationships that could be inferred but which aren't explicitly present in the data.
select distinct ?type where {
dbr:Stephen_King a ?type .
filter not exists {
?subtype ^a dbr:Stephen_King ;
rdfs:subClassOf ?type .
}
}
SPARQL results
Actually, since every class is an rdfs:subClassOf itself, you might want to add another line to that query to exclude the case where ?subtype and ?type are the same:
select distinct ?type where {
dbr:Stephen_King a ?type .
filter not exists {
?subtype ^a dbr:Stephen_King ;
rdfs:subClassOf ?type .
filter ( ?subtype != ?type )
}
}
SPARQL results
If you actually want a result string like the one shown in those slides, you could use values to bind a variable to dbr:Stephen_King, and then use some grouping and string concatenation to get something nicer looking (sort of):
select
(concat( ?person, " =\n", group_concat(?type; separator=" AND\n")) as ?sentence)
where {
values ?person { dbr:Stephen_King }
?type ^a ?person .
filter not exists {
?subtype ^a ?person ;
rdfs:subClassOf ?type .
filter ( ?subtype != ?type )
}
}
group by ?person
SPARQL results
http://dbpedia.org/resource/Stephen_King =
http://dbpedia.org/class/yago/AuthorsOfBooksAboutWritingFiction AND
http://dbpedia.org/ontology/Writer AND
http://schema.org/Person AND
http://xmlns.com/foaf/0.1/Person AND
http://dbpedia.org/class/yago/AmericanSchoolteachers AND
http://dbpedia.org/class/yago/LivingPeople AND
http://dbpedia.org/class/yago/PeopleFromBangor,Maine AND
http://dbpedia.org/class/yago/PeopleFromPortland,Maine AND
http://dbpedia.org/class/yago/PeopleFromSarasota,Florida AND
http://dbpedia.org/class/yago/PeopleSelf-identifyingAsAlcoholics AND
http://umbel.org/umbel/rc/Artist AND
http://umbel.org/umbel/rc/Writer AND
http://dbpedia.org/class/yago/20th-centuryNovelists AND
http://dbpedia.org/class/yago/21st-centuryNovelists AND
http://dbpedia.org/class/yago/AmericanHorrorWriters AND
http://dbpedia.org/class/yago/AmericanNovelists AND
http://dbpedia.org/class/yago/AmericanShortStoryWriters AND
http://dbpedia.org/class/yago/CthulhuMythosWriters AND
http://dbpedia.org/class/yago/HorrorWriters AND
http://dbpedia.org/class/yago/WritersFromMaine AND
http://dbpedia.org/class/yago/PeopleFromDurham,Maine AND
http://dbpedia.org/class/yago/PeopleFromLisbon,Maine AND
http://dbpedia.org/class/yago/PostmodernWriters