I am using the following query to get wikidata ID from dbpedia page using owl:sameas.
SELECT distinct ?wikidata_concept
WHERE {<http://dbpedia.org/resource/Category:Michael_Jackson> owl:sameAs ?wikidata_concept
FILTER(regex(str(?wikidata_concept), "www.wikidata.org" ) )}
LIMIT 100
It works fine on Virtuoso SPARQL Query Editor. I get http://www.wikidata.org/entity/Q7215695 as the answer which is correct.
However, when I try to do the same using SPARQLWrapper in python, I don't get the above answer (basically the data frame in empty).
My python code is as follows.
import pandas as pd
from SPARQLWrapper import SPARQLWrapper, JSON
sparql = SPARQLWrapper("http://live.dbpedia.org/sparql")
item = "http://dbpedia.org/resource/Category:Michael_Jackson"
sparql.setQuery(f"SELECT distinct ?wikidata_concept WHERE {{<{item}> owl:sameAs ?wikidata_concept FILTER(regex(str(?wikidata_concept), \"www.wikidata.org\" ) )}} LIMIT 100")
sparql.setReturnFormat(JSON)
results = sparql.query().convert()
print(results)
results_df = pd.io.json.json_normalize(results['results']['bindings'])
print(results_df)
Please let me know where I am making things wrong. I am happy to provide more details if needed.
DBpedia has 2 versions. So, the reason why I got two results is that I am using different versions in the two approaches.
Changing sparql = SPARQLWrapper("http://live.dbpedia.org/sparql") to sparql = SPARQLWrapper("http://dbpedia.org/sparql") solved my issue. So, I am using the same dbpedia version in the query editor and in my python code.
Related
Hey in my Django project i have a list of Algerian cities in my database, i recently added the latitude and longitude fields to the city table,
What i want to do is to use wikidata API to feed my database with the coordinates of each city i have in my database which are all algerian cities.
In terms of django and python i can figure it out on my own but i'm new to sparql so i need help in the SPARQL part so how do i achieve that?
This is what i have so far:
import sys
from SPARQLWrapper import SPARQLWrapper, JSON
endpoint_url = "https://query.wikidata.org/sparql"
query = """
#sparql query here
"""
def get_results(endpoint_url, query):
user_agent = "WDQS-example Python/%s.%s" % (sys.version_info[0], sys.version_info[1])
sparql = SPARQLWrapper(endpoint_url, agent=user_agent)
sparql.setQuery(query)
sparql.setReturnFormat(JSON)
return sparql.query().convert()
results = get_results(endpoint_url, query)
for result in results["results"]["bindings"]:
# my logic here
I preferred not to include my try to write the query because i don't think it'll be a good starting point
all.
I'm trying to write a simple SPARQL query generator to fetch all rdf:type relations of a specific DBPedia resource.
query = """SELECT * WHERE {{ <""" + resource """> rdfs:type ?subject.}}"""
This yields the Query
SELECT * WHERE {{ <http://dbpedia.org/page/Energy> rdfs:type ?subject.}}
But the query returns empty. What am I doing wrong? The DBPedia entry clearly has rdfs:type relations:
owl:Thing
dbo:Building
yago:Abstraction100002137
yago:Assets113329641
yago:NaturalResource113332009
yago:Possession100032613
yago:Relation100031921
yago:Resource113331778
yago:WikicatNaturalResources
Thanks in advance!
Change the energy address from page to resource, the query looks like this (in addition, I suggest you to use the a instead of therdf:type):
SELECT * WHERE {{ <http://dbpedia.org/resource/Energy> a ?subject.}}
In order to avoid this issue, chech the exact resurces addresses in a raw data format. For example, the XML triples can be reviewed with a web browser from the dbpedia webpage. http://dbpedia.org/page/Energy, in the top bar there is a button named formats.
I have a data source file that one of its properties is an actual class instance:
<clinic:Radiology rdf:ID="rad1234">
<clinic:diagnosis>Stage 4</clinic:diagnosis>
<clinic:ProvidedBy rdf:resource="#MountSinai"/>
<clinic:ReceivedBy rdf:resource="#JohnSmith"/>
<clinic:patientId>7890123</clinic:patientId>
<clinic:radiologyDate>01-01-2017</clinic:radiologyDate>
</clinic:Radiology>
so clinic:ProvidedBy is pointing to this:
<clinic:Radiologists rdf:ID="MountSinai">
<clinic:name>Mount Sinai</clinic:name>
<clinic:npi>1234567</clinic:npi>
<clinic:specialty>Oncology</clinic:specialty>
</clinic:Radiologists>
How do I query using the property clinic:providedBy (which is of type clinic:Radiologists)? Whatever I have tried does not bring back results.
It's also not clear what exactly you want to have, so my answer will return "all radiology resources that are provided by MountSinai":
PREFIX clinic: <THE NAMESPACE OF_THE_CLINIC_PREFIX>
PREFIX : <THE_BASE_NAMESPACE_OF_YOUR_RDF_DOCUMENT>
SELECT DISTINCT ?s WHERE {
?s clinic:ProvidedBy :MountSinai
}
But, I really suggest to start with an RDF and SPARQL tutorial, since form your comment your query
SELECT * WHERE { ?x rdf:resource "#MountSinai" }
is missing fundamental SPARQL basics. And for writing a matching SPARQL query it'S always good to have a look at the data in Turtle resp. N-Triples format both of which being closer to the SPARQL syntax.
I am trying to teach myself this weekend how to run API queries against a data source in this case data.gov. At first I thought I'd use a simple SQL variant, but it seems in this case I have to use SPARQL.
I've read through the documentation, downloaded Twinkle, and can't seem to quite get it to run. Here is an example of a query I'm running. I'm basically trying to find all gas stations that are null around Denver, CO.
PREFIX station: https://api.data.gov/nrel/alt-fuel-stations/v1/nearest.json?api_key=???location=Denver+CO
SELECT *
WHERE
{ ?x station:network ?network like "null"
}
Any help would be very much appreciated.
SPARQL is a graph pattern language for RDF triples. A query consists of a set of "basic graph patterns" described by triple patterns of the form <subject>, <predicate>, <object>. RDF defines the subject and predicate with URI's and the object is either a URI (object property) or literal (datatype or language-tagged property). Each triple pattern in a query must therefore have three entities.
Since we don't have any examples of your data, I'll provide a way to explore the data a bit. Let's assume your prefix is correctly defined, which I doubt - it will not be the REST API URL, but the URI of the entity itself. Then you can try the following:
PREFIX station: <http://api.data.gov/nrel...>
SELECT *
WHERE
{ ?s station:network ?network .
}
...setting the PREFIX to correctly represent the namespace for network. Then look at the binding for ?network and find out how they represent null. Let's say it is a string as you show. Then the query would look like:
PREFIX station: <http://api.data.gov/nrel...>
SELECT ?s
WHERE
{ ?s station:network "null" .
}
There is no like in SPARQL, but you could use a FILTER clause using regex or other string matching features of SPARQL.
And please, please, please google "SPARQL" and "RDF". There is lots of information about SPARQL, and the W3C's SPARQL 1.1 Query Language Recommendation is a comprehensive source with many good examples.
I use this sparql query to get as much cities as possible:
select * where {
?city rdf:type dbo:PopulatedPlace
}
However, some expected ones are missing e.g.
http://dbpedia.org/resource/Heidelberg
(neither that nor one of its wikiRedirects)
which is of a dbo:PopulatedPlace as this query returns true (in JSON):
ask {
:Heidelberg a dbo:PopulatedPlace
}
I need that list to be exhaustiv because later I will add constraints based on user input.
I use http://dbpedia.org/snorql/ to test the queries.
Any help is appreciated.
UPDATE:
One of the Devs told me the public endpoint is limited ( about 1K ).
I'll come up with a paginated solution and see if it contains the 'outlier'.
UPDATE2:
The outlier is definitly in the resultset of rdf:type dbo:Town.
Using dbo:PopulatedPlace yields too many results to check per hand, though.
The public endpoint limits results to about 1K. Pagination or use of a smaller subclass of dbo:PopulatedPlace yields the result.