extract city data from dbpedia or LinkedGeoData - sparql

I'm trying now for a couple of hours to figure out how to get various informations out of dbpedia or LinkedGeoData.
I used this interface (http://dbpedia.org/snorql) and tried a different approaches, but I never got the result that I need.
If I use something lik this:
SELECT * WHERE {
?subject rdf:type <http://dbpedia.org/ontology/City>.
OPTIONAL {
?subject <http://dbpedia.org/ontology/populationTotal> ?populationTotal.
}
OPTIONAL {
?subject <http://dbpedia.org/ontology/populationUrban> ?populationUrban.
}
OPTIONAL {
?subject <http://dbpedia.org/ontology/areaTotal> ?areaTotal.
}
OPTIONAL {
?subject <http://dbpedia.org/ontology/populationUrbanDensity> ?populationUrbanDensity.
}
OPTIONAL {
?subject <http://dbpedia.org/ontology/isPartOf> ?isPartOf.
}
OPTIONAL {
?subject <http://dbpedia.org/ontology/country> ?country.
}
OPTIONAL {
?subject <http://dbpedia.org/ontology/utcOffset> ?utcOffset.
}
OPTIONAL {
?subject <http://dbpedia.org/property/janHighC> ?utcOffset.
}
OPTIONAL {
?subject <http://dbpedia.org/property/janLowC> ?utcOffset.
}
}
LIMIT 20
I run out of limits.
I also tried this:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT * WHERE {
?subject rdf:type <http://dbpedia.org/ontology/City>.
?subject rdfs:label ?label.
FILTER ( lang(?label) = 'en'
}
LIMIT 100
But that give me en error, which I don't understand. If I remove the FILTER, it works but give me the labels in all languages...
What I'm looking for is something like this http://dbpedia.org/page/Vancouver
But not all the data, but some of it like population, area, coutry, elevation, lat, long, timezone, label#en, abstract#en etc.
Can someone help me to get working syntax?
Thanks for y'all help.
UPDATE:
I got it to work so far with:
SELECT DISTINCT *
WHERE {
?city rdf:type dbpedia-owl:Settlement ;
rdfs:label ?label;
dbpedia-owl:abstract ?abstract ;
dbpedia-owl:populationTotal ?pop ;
dbpedia-owl:country ?country ;
dbpprop:website ?website .
FILTER ( lang(?abstract) = 'en' && lang(?label) = 'en')
}
LIMIT 20
But still running out of limits if I want to get all settlements. Btw. is there a way to get all cities and settlements in one table?

By "run out of limits", do you mean the error "Bandwidth Limit Exceeded URI = '/!sparql/'"? I guess this is a limit set by dbpedia to make sure that it is not flooded with queries that take "forever" to run, and if so, then there is probably not so much you can do. You can ask for data in chunks, using OFFSET, LIMIT and ORDER BY, see http://www.w3.org/TR/rdf-sparql-query/#modOffset.
UPDATE: Yes, this seems to be the way to go: http://www.mail-archive.com/dbpedia-discussion#lists.sourceforge.net/msg03368.html
In the second query the error is a missing parenthesis. This
FILTER ( lang(?label) = 'en'
should be
FILTER ( lang(?label) = 'en')
For your last question, a natural way to collect multiple things/(similiar queries) in one query/table is using UNION, e.g.,
SELECT ?x
WHERE {
{ ?x rdf:type dbpedia-owl:City }
UNION
{ ?x rdf:type dbpedia-owl:Settlement }
}

Related

how to find classes from dbpedia?

I need a sparql query that given a free text (user input),
it finds me from dbpedia all the classes related to it.
How do it?
Also asked here. Accepted answer said --
When you say classes, are you mean about types? If yes, try something like
SELECT ?uri ?label ?type
WHERE {
?uri rdfs:label ?label .
?uri <http://dbpedia.org/ontology/type> ?type .
FILTER regex(str(?label), "Leipzig") .
}
limit 10
I couldn't let this go...
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX virtdrf: <http://www.openlinksw.com/schemas/virtrdf#>
SELECT ?s1c AS ?c1
COUNT (*) AS ?c2
?c3
WHERE
{
QUAD MAP virtrdf:DefaultQuadMap
{
GRAPH ?g
{
?s1 ?s1textp ?o1 .
?o1 bif:contains '"dbpedia"' .
}
}
?s1 a ?s1c .
OPTIONAL { ?s1c rdfs:label ?c3
FILTER(langMatches(LANG(?c3),"EN"))}
}
GROUP BY ?s1c ?c3
ORDER BY DESC (2) ASC (3)
The earlier answer gets you partial results.

SPARQL Matching Literals with **ANY** Language Tags without run into timeout

I need to select the entity that have a "taxon rank (P105)" of "species (Q7432)" which have a label that match a literal string such as "Topinambur".
I'm testing the queries on https://query.wikidata.org;
this query goes fine and return the entity to me with satisfying response time:
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT * WHERE {
?entity rdfs:label "Topinambur"#de .
?entity wdt:P105 wd:Q7432.
}
LIMIT 100
The problem here is that my requisite is not to specify the language but the lexical forms of the labels in the underlying dataset ( wikidata) has language tags so i need a way to get Literal Equality for any language.
I tried some possible solution but I didn't find any query that didn't result in the following:
TIMEOUT message com.bigdata.bop.engine.QueryTimeoutException: Query deadline is expired
Here the list of what I tried (..and I always get TIMEOUT) :
1) based on this answer I tried:
SELECT * WHERE {
?entity rdfs:label ?label FILTER ( str( ?label ) = "Topinambur") .
?entity wdt:P105 wd:Q7432.
}
LIMIT 100
2) based on some other documentation I tried:
SELECT * WHERE {
?entity wdt:P105 wd:Q7432.
?entity rdfs:label ?label FILTER regex(?label, "^Topinambur") .
}
LIMIT 100
3) and
SELECT * WHERE {
?entity wdt:P105 wd:Q7432.
?entity rdfs:label ?label .
FILTER langMatches( lang(?label), "*" )
FILTER (?label = "Topinambur")
}
LIMIT 100
What I'm looking for is a performant solution or some SPARQL syntax the doesn't end up to a TIMEOUT message.
PS: with reference to http://www.rfc-editor.org/rfc/bcp/bcp47.txt I don't understand if language ranges or ```wildcards`` could help in some way.
EDIT
I successfully tested (without falling timeout) a similar query in DbPedia by using virtuoso query editor at:
https://dbpedia.org/sparql
Default Data Set Name (Graph IRI):http://dbpedia.org
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dbo: <http://dbpedia.org/ontology/>
SELECT ?resource
WHERE {
?resource rdfs:label ?label . FILTER ( str( ?label ) = "Topinambur").
?resource rdf:type dbo:Species
}
LIMIT 100
I am still very interested in understanding the performance problem that I experience on Wikidata and what is the best syntax to use.
I solved similar problem - want to find entity with label string in any language. I recommend do not use FILTER, because it is too slow. Rather use UNION like this:
SELECT ?entity WHERE {
?entity wdt:P105 wd:Q7432.
{ ?entity rdfs:label "Topinambur"#de . }
UNION { ?entity rdfs:label "Topinambur"#en . }
UNION { ?entity rdfs:label "Topinambur"#fr . }
}
GROUP BY ?entity
LIMIT 100
Try it!
This solution is not perfect, because you have to enumater all languages, but is fast and reliable. List of all available wikidata language are here.
This answer proposes three options:
Be more specific.
The ?entity wdt:P171+ wd:Q25314 pattern seems to be sufficiently selective in your case.
Wait until they implement full-text search.
Use Quarry (example query).
Another option is to use Virtuoso full-text search capabilities on wikidata.dbpedia.org:
SELECT ?s WHERE {
?resource rdfs:label ?label .
?label bif:contains "'topinambur'" .
BIND ( IRI ( REPLACE ( STR(?resource),
"http://wikidata.dbpedia.org/resource",
"http://www.wikidata.org/entity"
)
) AS ?s
)
}
Try it!
It seems that even the query below sometime works on wikidata.dbpedia.org without falling into timeout:
PREFIX dbo: <http://dbpedia.org/ontology/>
SELECT ?resource WHERE {
?resource rdfs:label ?label .
FILTER ( STR(?label) = "Topinambur" ) .
}
Try it!
Two hours ago I've removed this statement on Wikidata:
wd:Q161378 rdfs:label "topinambur"#ru .
I'm not a botanist, but 'topinambur' is definitely not a word in Russian.
Working further on from #quick's answer, and showing it for lexemes rather than labels. First identifying relevant language codes:
SELECT (GROUP_CONCAT(?mword; separator=" ") AS ?mwords) {
BIND(1 AS ?dummy)
VALUES ?word { "topinambur" }
{
SELECT (COUNT(?lexeme) AS ?count) ?language_code {
?lexeme dct:language / wdt:P424 ?language_code .
}
GROUP BY ?language_code
HAVING (?count > 100)
ORDER BY DESC(?count)
}
BIND(CONCAT('"', ?word, '"#', ?language_code) AS ?mword)
}
GROUP BY ?dummy
Try it!
Followed by the verbose query
SELECT (COUNT(?lexeme) AS ?count) ?language (GROUP_CONCAT(?word; separator=" ") AS ?words) {
VALUES ?word { "topinambur"#eo "topinambur"#ko "topinambur"#bfi "topinambur"#nl "topinambur"#uk "topinambur"#cy "topinambur"#pt "topinambur"#zh "topinambur"#br "topinambur"#bg "topinambur"#ms "topinambur"#tg "topinambur"#se "topinambur"#ta "topinambur"#non "topinambur"#it "topinambur"#zh-min-nan "topinambur"#nan "topinambur"#fi "topinambur"#jbo "topinambur"#ml "topinambur"#ja "topinambur"#ku "topinambur"#bn "topinambur"#ar "topinambur"#nb "topinambur"#es "topinambur"#pl "topinambur"#nn "topinambur"#sk "topinambur"#da "topinambur"#de "topinambur"#cs "topinambur"#fr "topinambur"#sv "topinambur"#eu "topinambur"#he "topinambur"#la "topinambur"#en "topinambur"#ru }
?lexeme dct:language ?language ;
ontolex:lexicalForm / ontolex:representation ?word .
}
GROUP BY ?language
Try it!
For querying on labels, do something similar to:
SELECT (COUNT(?item) AS ?count) ?language (GROUP_CONCAT(?word; separator=" ") AS ?words) {
VALUES ?word { "topinambur"#eo "topinambur"#ko "topinambur"#bfi "topinambur"#nl "topinambur"#uk "topinambur"#cy "topinambur"#pt "topinambur"#zh "topinambur"#br "topinambur"#bg "topinambur"#ms "topinambur"#tg "topinambur"#se "topinambur"#ta "topinambur"#non "topinambur"#it "topinambur"#zh-min-nan "topinambur"#nan "topinambur"#fi "topinambur"#jbo "topinambur"#ml "topinambur"#ja "topinambur"#ku "topinambur"#bn "topinambur"#ar "topinambur"#nb "topinambur"#es "topinambur"#pl "topinambur"#nn "topinambur"#sk "topinambur"#da "topinambur"#de "topinambur"#cs "topinambur"#fr "topinambur"#sv "topinambur"#eu "topinambur"#he "topinambur"#la "topinambur"#en "topinambur"#ru }
?item rdfs:label ?word ;
}
GROUP BY ?language

Query for resources using their URIs

I have a bunch of resource URIs, and I need the property values related to each of them. For a single resource, say <http://my.url/res#resourceUri>, I can write this query:
PREFIX v: <http://my.url/res#>
SELECT ?name
WHERE {
<http://my.url/res#resourceUri> a v:t;
rdfs:label ?name .
}
For multiple resources, I can use UNION, like this:
PREFIX v: <http://my.url/res#>
SELECT ?name
WHERE {
{ <http://my.url/res#resourceUri> a v:t; rdfs:label ?name } UNION
{ <http://my.url/res#anotherResource> a v:t; rdfs:label ?name }
}
Is there a way to write a shorter, leaner version of this second query?
You can use values for this. Your example would be written as
PREFIX v: <http://my.url/res#>
SELECT ?resource ?name WHERE {
values ?resource { <http://my.url/res#resourceUri>
<http://my.url/res#anotherResource> }
?resource a v:t;
rdfs:label ?name
}
The question is different, but the answer to how to use Union/or in sparql path with arbitrary length? is similar.

DBpedia SPARQL Query US Universities

I created a SPARQL query that I'm running on the DBpedia SNORQL SPARQL endpoint. The purpose of the query is to get a list of universities or colleges in the United States, including their longitude, latitude, and endowment. The query seems to be working but seems to be missing some records and/or attributes. So, for example, Harvard University doesn't show up in the result, even though its DBpedia record exists and the attributes should match my query. I'm not sure why that record doesn't show up. Another example is University of Massachusetts Boston, which comes up as a query result, but the result doesn't get the longitude and latitude attributes, even though the record contains those attributes. Here's the SPARQL Query:
PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>
PREFIX d: <http://dbpedia.org/ontology/>
SELECT ?uni ?link ?lat ?long ?endowment
WHERE {
?s foaf:homepage ?link ;
rdf:type <http://schema.org/CollegeOrUniversity> ;
rdfs:label ?uni
OPTIONAL {?s geo:lat ?lat ;
geo:long ?long .
?s d:endowment ?endowment . }
FILTER (LANGMATCHES(LANG(?uni), 'en'))
{?s dbpedia2:country "U.S."#en . }
UNION
{?s dbpedia2:country "U.S." . }
UNION
{?s d:country :United_States . }
}
ORDER BY ?s
The query you posted will only select entities with a foaf:homepage and Harvard University does not have one. (That is, the resource does not have a foaf:homepage property. Obviously the university does have a homepage.) UMass Boston doesn't match the optional pattern --
OPTIONAL {?s geo:lat ?lat ;
geo:long ?long .
?s d:endowment ?endowment . }
-- because that pattern only matches when ?s has a geo:lat, a geo:long, and a d:endowment. Though the pattern is optional, the whole pattern must either match or not; you do not get partial matches.
Here's your query, reworked to use the built-in namespaces that the DBPedia SPARQL endpoint currently supports (that list is subject to change over time), with the OPTIONAL parts broken down as necessary, and moved to the end. (Moving them to the end is just an aesthetic consideration.) I tried some various constraints, and it is interesting to note that only 32 universities have the dbpprop:country "U.S."#en, but 273 have dbpprop:country "United States"#en. There are 7620 results in total.
PREFIX dbpedia-owl: <http://dbpedia.org/ontology/>
PREFIX dbpedia: <http://dbpedia.org/resource/>
PREFIX dbpprop: <http://dbpedia.org/property/>
SELECT ?label ?homepage ?lat ?long ?endowment
WHERE {
?school a <http://schema.org/CollegeOrUniversity>
{ ?school dbpedia-owl:country dbpedia:United_States }
UNION
{ ?school dbpprop:country dbpedia:United_States }
UNION
{ ?school dbpprop:country "U.S."#en }
UNION
{ ?school dbpprop:country "United States"#en }
OPTIONAL { ?school rdfs:label ?label .
FILTER (LANGMATCHES(LANG(?label), 'en')) }
OPTIONAL { ?school foaf:homepage ?homepage }
OPTIONAL { ?school geo:lat ?lat ; geo:long ?long }
OPTIONAL { ?school dbpedia-owl:endowment ?endowment }
}
SPARQL Results
You are looking for foaf:homepage but some of them do not have this assigned. That is the first thing that caught my eyes. Check the rest of the query by removing bit by bit each element and see what the result set has to offer.

querying dbpedia sparql for more results

I want to obtain some data from dbpedia.
I have entities urls and want to get some information about localization.
Now i call query like this:
PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>
PREFIX dbo: <http://dbpedia.org/ontology/>
SELECT DISTINCT * WHERE
{
<{0}> rdfs:label ?label .
OPTIONAL {
<{0}> geo:lat ?lat ;
geo:long ?long .
} .
OPTIONAL {
<{0}> dbo:Country ?dboCountry
} .
OPTIONAL {
<{0}> dbpedia-owl:country ?dbpediaContry .
?dbpediaContry dbpprop:cctld ?ccTLD
}.
OPTIONAL {
<{0}> dbpprop:country ?dbpropContry
}
FILTER ( lang(?label) = "en" )
}
for each url (replace {0} with url).
But I would like to optimize it and get result for more entities in one query.
Also is it possible to not set url in each line?
Regards
Piotr
Hmm, it looks I already have found an answer for both questions.
Do you know this(http://en.wikipedia.org/wiki/Rubber_duck_debugging)
Solution is:
SELECT DISTINCT *
WHERE {
?uri rdfs:label ?label .
OPTIONAL { ?uri geo:lat ?lat .
?uri geo:long ?long} .
FILTER (?uri IN ({0}, {1}, ...) )
}
Maybe it will be helpful for somebody else?
Or maybe someone knows better solution?