DBpedia SPARQL to eliminate unwanted data - sparql

PREFIX category: <http://dbpedia.org/resource/Category:>
SELECT DISTINCT ?attractions
?location
WHERE
{ ?attractions dcterms:subject ?places
. ?places skos:broader ?border
. ?attractions dbpprop:location|dbpedia-owl:locatedInArea|dbpprop:locale ?location
. FILTER( ?border = category:Visitor_attractions_in_Delhi )
}
I have above query giving result of attraction location of Delhi. I need to make it generic for all places, and secondly I want to filter out unwanted data. I want only attraction places, e.g., I didn't want List of Monuments and SelectCityWalk like data in my output.

Related

i want to get the names of similar types using sparql queries from dbpedia

I need to find the names of similar types from DBpedia so I'm trying to figure out a query which can return me the names of entities which have same subject type in its dct:subject (example I want to find similar types of white house so i want to write a query for same . I'm considering the dct:subject to find them ). If there is any other approach please mention it
Previously I tried it for rdf:type but the result are not so good and some time it shows time out
I have done my problem by the query mentioned below and now i want to consider dct:subject instead of rdf:type
select distinct ?label ?resource count(distinct ?type) as ?score where {
values ?type { dbo:Thing dbo:Organization yago:WikicatIslam-relatedControversies yago:WikicatIslamistGroups yago:WikicatRussianFederalSecurityServiceDesignatedTerroristOrganizations yago:Abstraction100002137 yago:Act100030358 yago:Cabal108241798 yago:Group100031264 yago:Movement108464601 yago:PoliticalMovement108472335
}
?resource rdfs:label ?label ;
foaf:name ?name ;
a ?type .
FILTER (lang(?label) = 'en').
}
ORDER BY DESC(?score)

How to exclude nodes from path?

I want to get all mathematicians from DBpedia, so I wrote this query for DBpedia's SPARQL service:
SELECT DISTINCT ?person
{
?person dct:subject ?category.
?category skos:broader* dbc:Mathematicians.
}
The problem with this is that the category Mathematicians is polluted, due to categories like dbc:Euclid, which then includes all of Euclidean geometry. I believe it's categories like these which cause the query to fail:
Virtuoso 42000 Error TN...: Exceeded 1000000000 bytes in transitive temp memory. use t_distinct, t_max or more T_MAX_memory options to limit the search or increase the pool
A lot of the problematic categories are in dbc:Wikipedia_categories_named_after_mathematicians.
Is there some way to ignore these categories in the skos:broader* path that would make the error go away?
You can list the categories that you don't want to include by filtering them out:
SELECT DISTINCT ?person
{
?person dct:subject ?category.
?category skos:broader* dbc:Mathematicians.
FILTER (?category NOT IN (dbc:Euclid))
}
But that won't remove the error because Virtuoso still needs to traverse the skos:broader hierarchy, exhausting 'transitive heap memory'. Other approaches include selecting specific categories or traversing part of the hierarchy.
The specific category could use UNION statements, but the VALUES shortcut is a simpler syntax:
SELECT DISTINCT ?person
{
VALUES ?category {dbc:Mathematicians dbc:Mental_calculators dbc:Lists_of_mathematicians}
?person dct:subject ?category.
}
For querying part of the hierarchy, you can use some property path expressions. This one will get parents and grandparents:
SELECT DISTINCT ?person
{
?person dct:subject ?category.
?category skos:broader | (skos:broader/skos:broader) dbc:Mathematicians.
# filter as desired - FILTER (?category NOT IN (dbc:Euclid))
}

Not able to get Indian cities abstract from Sparql

I am trying to get abstract using Sqarql with dbpedia datasets.
When I am running the following query on Virtuoso,
PREFIX dbpedia-owl: <http://dbpedia.org/ontology/>
SELECT DISTINCT ?abstract WHERE {
[ rdfs:label ?name
; dbpedia-owl:abstract ?abstract
] .
FILTER langMatches(lang(?abstract),"en")
VALUES ?name { "London"#en }
}
LIMIT 10
I am getting the result, however if I modify the name to say 'Gokarna' which is a south indian tourist spot, I am not getting any data. However I do see the resource page online on dbpedia for Gokarna(http://dbpedia.org/page/Gokarna,_India). What am I doing wrong? I need to get similar data for close to 800 indian places.
When you use values, you'd get only those that exactly match your string. For Gokarna, that would work for #de, #it, #fr, but not for #en, as there the label is different, as you can see also from the previous answer.
I would suggest to use contains, instead of values:
PREFIX dbpedia-owl: <http://dbpedia.org/ontology/>
SELECT DISTINCT ?abstract WHERE {
[ rdfs:label ?name
; dbpedia-owl:abstract ?abstract
] .
FILTER langMatches(lang(?abstract),"en")
FILTER langMatches(lang(?name),"en")
FILTER CONTAINS (?name, "Gokarna" )
}
LIMIT 10
I am not that experience with Sqarql but as much i can see in your code and checked with dbpedia library...
it is not just Gokarna. it is "Gokarna,_India".
This should work..
PREFIX dbpedia-owl: <http://dbpedia.org/ontology/>
SELECT DISTINCT ?abstract WHERE {
[ rdfs:label ?name
; dbpedia-owl:abstract ?abstract
] .
FILTER langMatches(lang(?abstract),"en")
VALUES ?name { "Gokarna,_India"#en }
}
LIMIT 10
If you look through the DBpedia page for Gokarna, India that you linked to, you'll notice that its rdfs:label is "Gokarna, India". But its foaf:name is just "Gokarna". This would mean you should modify your query to:
PREFIX dbpedia-owl: <http://dbpedia.org/ontology/>
SELECT DISTINCT ?abstract WHERE {
[ foaf:name ?name
; dbpedia-owl:abstract ?abstract
] .
FILTER langMatches(lang(?abstract),"en")
VALUES ?name { "Gokarna"#en }
}
LIMIT 10
Though this will return other Gokarnas too: Gokarna, Nepal, Gokarna, Bangladesh and Gokarna (film). If you want to remove these, you will have to figure out another filter (possibly dbo:country dbr:India).

How to traverse skos:broader property to list tourist places in various cities

Any help in my learning will be highly appreciated.
Problem statement:
Need to print list of tourist places in various cities
I have http://dbpedia.org/page/Category:Tourism_by_city category which i have to explore. The problem is this category is a skos:broader of various other categories like http://dbpedia.org/page/Category:Tourism_in_Bratislava which in itself is a skos:broader of http://dbpedia.org/page/Category:Visitor_attractions_in_Bratislava which contains dcterms:subject property and which is the list of tourist places.
I have to explore all cities starting from Tourism_by_city category.
What I have done
SELECT DISTINCT ?places
WHERE {
?entity skos:broader* <http://dbpedia.org/resource/Category:Tourism_by_city> .
?places dcterms:subject ?entity
}
Problem:
skos:broader is further exploring the graph, i want to restrict it till Visitor_attractions level. Also skos:broader is exploring all the categories but i want it to explore for just Visitor_attractions category.
Level 1 : Tourism_by_city - > explore all skos:broader of
Level 2 : Tourism_by_xxxcity -> explore only category:Vistors_attractions_by_xxxcity
Level 3 : Do not explore further.
Is this achievable ?
Please do let me know if question is unclear. Thanks
Finally solved the question, I don't know whether this is the only viable option:
SELECT DISTINCT (str(?city) as ?City) (str(?label) as ?Attractions)
WHERE {
?entity skos:broader <http://dbpedia.org/resource/Category:Tourism_by_city> .
?places skos:broader ?entity .
?places rdfs:label ?city .
FILTER langMatches(lang(?city),"en") .
?attractions dcterms:subject ?places .
?attractions rdfs:label ?label .
FILTER langMatches(lang(?label),"en").
FILTER (if (isliteral(?city ), contains(str(?city ), "Visitor"), false))
}
ORDER BY ASC(?city)

How to handle Wikipedia Named Entities that have the same Category name

I was trying to extract all US companies so I ran the following query
PREFIX cat: <http://dbpedia.org/resource/Category:>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
SELECT DISTINCT ?page ?subcat WHERE { ?subcat skos:broader* cat:Companies_of_the_United_States_by_industry .
?page dcterms:subject ?subcat .
?page rdfs:label ?pageName.
}
This is a snapshot of the results
Amgen and Pfizer are both companies as well as Category, so I end up collecting everything under Pfizer and Amgen (people, product). I found out that these entries belong to wikipedia category called Category:Wikipedia_categories_named_after_companies_of_the_United_States or Category:Wikipedia_categories_named_after_pharmaceutical_companies_of_the_United_States. So I tried to filter these categories so I did this
SELECT DISTINCT ?page ?subcat WHERE { ?subcat skos:broader* cat:Companies_of_the_United_States_by_industry .
?page dcterms:subject ?subcat .
?page rdfs:label ?pageName.
FILTER( !regex(?subcat,"Wikipedia_categories_named_after_pharmaceutical_companies_of_the_United_States")) }
But no luck, they are still there. Any idea how to avoid this problem?
The problem doesn't have anything to do with them having the same name. Wikipedia categories don't form a type hierarchy, so it doesn't make sense to treat them like one. The reason you see the results that you're seeing is that there's a category Pfizer, and that its broader values include the company listings, but is also the dcterms:subject of dbpedia:Alprazolam, dbpedia:Cetirizine, etc. It doesn't make sense as a type hierarchy, but it is fine for organizing article topics. If you only want companies back, just ask for things that are companies:
SELECT DISTINCT ?page ?subcat WHERE {
?subcat skos:broader* category:Companies_of_the_United_States_by_industry .
?page dcterms:subject ?subcat .
?page rdfs:label ?pageName.
?page a dbpedia-owl:Company
}
We can clean that up a bit, though. You're not using ?label, so we can remove it. We can use some of the shorter syntaxes to make things a little bit cleaner. We can also note that "Companies … by industry" has a skos:broader value "Companies of the United States" which makes the intent of the query a bit clearer.
select distinct ?company ?subcategory where {
?company dcterms:subject ?subcategory ;
a dbpedia-owl:Company .
?subcategory skos:broader* category:Companies_of_the_United_States .
}
limit 1000
SPARQL results
As a final note, the category hierarchy doesn't necessarily mean that each company has a single path to the top category. That is, you could get some company listed multiple times, e.g.:
company subcategory
------------------------------------
companyX Textile_Companies
companyX Companies_in_New_Hampshire
Unless you need the listing of subcategories, you might consider eliminating it from the query, in which case you can simply have (using property paths):
select distinct ?company where {
?company a dbpedia-owl:Company ;
dcterms:subject/skos:broader* category:Companies_of_the_United_States .
}
limit 1000
SPARQL results