Hey in my Django project i have a list of Algerian cities in my database, i recently added the latitude and longitude fields to the city table,
What i want to do is to use wikidata API to feed my database with the coordinates of each city i have in my database which are all algerian cities.
In terms of django and python i can figure it out on my own but i'm new to sparql so i need help in the SPARQL part so how do i achieve that?
This is what i have so far:
import sys
from SPARQLWrapper import SPARQLWrapper, JSON
endpoint_url = "https://query.wikidata.org/sparql"
query = """
#sparql query here
"""
def get_results(endpoint_url, query):
user_agent = "WDQS-example Python/%s.%s" % (sys.version_info[0], sys.version_info[1])
sparql = SPARQLWrapper(endpoint_url, agent=user_agent)
sparql.setQuery(query)
sparql.setReturnFormat(JSON)
return sparql.query().convert()
results = get_results(endpoint_url, query)
for result in results["results"]["bindings"]:
# my logic here
I preferred not to include my try to write the query because i don't think it'll be a good starting point
Related
What would be the code to easily get all the states (second subdivisions) of a country?
The pattern from OSMNX is, more or less:
division
admin_level
country
2
region
3
state
4
city
8
neighborhood
10
For an example, to get all the neighborhoods from a city:
import pandas as pd
import geopandas as gpd
import osmnx as ox
place = 'Rio de Janeiro'
tags = {'admin_level': '10'}
gdf = ox.geometries_from_place(place, tags)
The same wouldn't apply if one wants the states from a country?
place = 'Brasil'
tags = {'admin_level': '4'}
gdf = ox.geometries_from_place(place, tags)
I'm not even sure this snippet doesn't work, because I let it run for 4 hours and it didn't stop running. Maybe the package isn't made for downloading big chunks of data, or there's a solution more efficient than ox.geometries_from_place() for that task, or there's more information I could add to the tags. Help is appreciated.
OSMnx can potentially get all the states or provinces from some country, but this isn't a use case it's optimized for, and your specific use creates a few obstacles. You can see your query reproduced on Overpass Turbo.
You're using the default query area size, so it's making thousands of requests
Brazil's bounding box intersects portions of overseas French territory, which in turn pulls in all of France (spanning the entire globe)
OSMnx uses an r-tree to filter the final results, but globe-spanning results make this index perform very slowly
OSMnx can acquire geometries either via the geometries module (as you're doing) or via the geocode_to_gdf function in the geocoder module. You may want to try the latter if it fits your use case, as it's extremely more efficient.
With that in mind, if you must use the geometries module, you can try a few things to improve performance. First off, adjust the query area so you're downloading everything with one single API request. You're downloading relatively few entities, so the huge query area should still be ok within the timeout interval. The "intersecting overseas France" and "globe-spanning r-tree" problems are harder to solve. But as a demonstration, here's a simple example with Uruguay instead. It takes 20 something seconds to run everything on my machine:
import osmnx as ox
ox.settings.log_console = True
ox.settings.max_query_area_size = 25e12
place = 'Uruguay'
tags = {'admin_level': '4'}
gdf = ox.geometries_from_place(place, tags)
gdf = gdf[gdf["is_in:country"] == place]
gdf.plot()
I'm looking for an implementation to run text query ex: "g.V().limit(1).toList()" while using the PatitionStrategy in Apache TinkerPop.
I'm attempting to build a REST interface to run queries on selected graph paritions only. I know how to run a raw query using Client, but I'm looking for an implementation where I can create a multi-tenant graph (https://tinkerpop.apache.org/docs/current/reference/#partitionstrategy) and query only selected tenants using raw text query instead of a GLV. Im able to query only selected partitions using pythongremlin, but there is no reference implementation I could find to run a text query on a tenant.
Here is tenant query implementation
connection = DriverRemoteConnection('ws://megamind-ws:8182/gremlin', 'g')
g = traversal().withRemote(connection)
partition = PartitionStrategy(partition_key="partition_key",
write_partition="tenant_a",
read_partitions=["tenant_a"])
partitioned_g = g.withStrategies(partition)
x = partitioned_g.V.limit(1).next() <---- query on partition only
Here is how I execute raw query on entire graph, but Im looking for implementation to run text based queries on only selected partitions.
from gremlin_python.driver import client
client = client.Client('ws://megamind-ws:8182/gremlin', 'g')
results = client.submitAsync("g.V().limit(1).toList()").result().one() <-- runs on entire graph.
print(results)
client.close()
Any suggestions appreciated? TIA
It depends on how the backend store handles text mode queries, but for the query itself, essentially you just need to use the Groovy/Java style formulation. This will work with GremlinServer and Amazon Neptune. For other backends you will need to make sure that this syntax is supported. So from Python you would use something like:
client.submit('
g.withStrategies(new PartitionStrategy(partitionKey: "_partition",
writePartition: "b",
readPartitions: ["b"])).V().count()')
all.
I'm trying to write a simple SPARQL query generator to fetch all rdf:type relations of a specific DBPedia resource.
query = """SELECT * WHERE {{ <""" + resource """> rdfs:type ?subject.}}"""
This yields the Query
SELECT * WHERE {{ <http://dbpedia.org/page/Energy> rdfs:type ?subject.}}
But the query returns empty. What am I doing wrong? The DBPedia entry clearly has rdfs:type relations:
owl:Thing
dbo:Building
yago:Abstraction100002137
yago:Assets113329641
yago:NaturalResource113332009
yago:Possession100032613
yago:Relation100031921
yago:Resource113331778
yago:WikicatNaturalResources
Thanks in advance!
Change the energy address from page to resource, the query looks like this (in addition, I suggest you to use the a instead of therdf:type):
SELECT * WHERE {{ <http://dbpedia.org/resource/Energy> a ?subject.}}
In order to avoid this issue, chech the exact resurces addresses in a raw data format. For example, the XML triples can be reviewed with a web browser from the dbpedia webpage. http://dbpedia.org/page/Energy, in the top bar there is a button named formats.
I am using the following query to get wikidata ID from dbpedia page using owl:sameas.
SELECT distinct ?wikidata_concept
WHERE {<http://dbpedia.org/resource/Category:Michael_Jackson> owl:sameAs ?wikidata_concept
FILTER(regex(str(?wikidata_concept), "www.wikidata.org" ) )}
LIMIT 100
It works fine on Virtuoso SPARQL Query Editor. I get http://www.wikidata.org/entity/Q7215695 as the answer which is correct.
However, when I try to do the same using SPARQLWrapper in python, I don't get the above answer (basically the data frame in empty).
My python code is as follows.
import pandas as pd
from SPARQLWrapper import SPARQLWrapper, JSON
sparql = SPARQLWrapper("http://live.dbpedia.org/sparql")
item = "http://dbpedia.org/resource/Category:Michael_Jackson"
sparql.setQuery(f"SELECT distinct ?wikidata_concept WHERE {{<{item}> owl:sameAs ?wikidata_concept FILTER(regex(str(?wikidata_concept), \"www.wikidata.org\" ) )}} LIMIT 100")
sparql.setReturnFormat(JSON)
results = sparql.query().convert()
print(results)
results_df = pd.io.json.json_normalize(results['results']['bindings'])
print(results_df)
Please let me know where I am making things wrong. I am happy to provide more details if needed.
DBpedia has 2 versions. So, the reason why I got two results is that I am using different versions in the two approaches.
Changing sparql = SPARQLWrapper("http://live.dbpedia.org/sparql") to sparql = SPARQLWrapper("http://dbpedia.org/sparql") solved my issue. So, I am using the same dbpedia version in the query editor and in my python code.
After I query dbpedia over its Sparql endpoint, I get results as Jena ResourceImpl objects. Then how I can get details of this resource? For example if this resource is a person; how can I get his/her birthDate?
I tried this one; but it always returns null.
QuerySolution querySolution = resultSet.next();
RDFNode x = querySolution.get("x");
ResourceImpl resource = (ResourceImpl) x;
Property property = new PropertyImpl("http://dbpedia.org/property/birthDate");
Resource propertyResourceValue = resource.getPropertyResourceValue(property); // NULL
Likely you will need to make a subsequent SPARQL query if you want to get further details about the resource. E.g.,
String nextQuery = "DESCRIBE " + FmtUtils.stringForNode(resource.asNode(), (SerializationContext)null);
Query describeQuery = QueryFactory.create(nextQuery);
QueryExecution exec = QueryExecutionFactory.sparqlService("http://endpoint", describeQuery);
Model m = exec.execDescribe();
You should then be able to use the resource API over the resulting model to get the information you want.
Assuming your SPARQL query is a SELECT ..., the ResultSet is a table and each QuerySolution is a row in that table. When you get the Resource from such a row, you only have a Resource; the properties are not automatically attached. Hence, getting a property value on the Resource returns null.
It looks like RDFOutput does what you expected: transform a SPARQL query ResultSet to an RDF Model.