I am trying to extract labels from DBpedia for some persons. I am partially successful now, but I got stuck in the following problem. The following code works.
public class DbPediaQueryExtractor {
public static void main(String [] args) {
String entity = "Aharon_Barak";
String queryString ="PREFIX dbres: <http://dbpedia.org/resource/> SELECT * WHERE {dbres:"+ entity+ "<http://www.w3.org/2000/01/rdf-schema#label> ?o FILTER (langMatches(lang(?o),\"en\"))}";
//String queryString="select * where { ?instance <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person>; <http://www.w3.org/2000/01/rdf-schema#label> ?o FILTER (langMatches(lang(?o),\"en\")) } LIMIT 5000000";
QueryExecution qexec = getResult(queryString);
try {
ResultSet results = qexec.execSelect();
for ( ; results.hasNext(); )
{
QuerySolution soln = results.nextSolution();
System.out.print(soln.get("?o") + "\n");
}
}
finally {
qexec.close();
}
}
public static QueryExecution getResult(String queryString){
Query query = QueryFactory.create(queryString);
//VirtuosoQueryExecution vqe = VirtuosoQueryExecutionFactory.create (sparql, graph);
QueryExecution qexec = QueryExecutionFactory.sparqlService("http://dbpedia.org/sparql", query);
return qexec;
}
}
However, when the entity contains brackets, it does not work. For example,
String entity = "William_H._Miller_(writer)";
leads to this exception:
Exception in thread "main" com.hp.hpl.jena.query.QueryParseException: Encountered " "(" "( "" at line 1, column 86.`
What is the problem?
It took some copying and pasting to see what exactly was going on. I'd suggest that you put newlines in your query for easier readability. The query you're using is:
PREFIX dbres: <http://dbpedia.org/resource/>
SELECT * WHERE
{
dbres:??? <http://www.w3.org/2000/01/rdf-schema#label> ?o
FILTER (langMatches(lang(?o),"en"))
}
where ??? is being replaced by the contents of the string entity. You're doing absolutely no input validation here to ensure that the value of entity will be legal to paste in. Based on your question, it sounds like entity contains William_H._Miller_(writer), so you're getting the query:
PREFIX dbres: <http://dbpedia.org/resource/>
SELECT * WHERE
{
dbres:William_H._Miller_(writer) <http://www.w3.org/2000/01/rdf-schema#label> ?o
FILTER (langMatches(lang(?o),"en"))
}
You can paste that into the public DBpedia endpoint, and you'll get a similar parse error message:
Virtuoso 37000 Error SP030: SPARQL compiler, line 6: syntax error at 'writer' before ')'
SPARQL query:
define sql:big-data-const 0
#output-format:text/html
define sql:signal-void-variables 1 define input:default-graph-uri <http://dbpedia.org> PREFIX dbres: <http://dbpedia.org/resource/>
SELECT * WHERE
{
dbres:William_H._Miller_(writer) <http://www.w3.org/2000/01/rdf-schema#label> ?o
FILTER (langMatches(lang(?o),"en"))
}
Better than hitting DBpedia's endpoint with bad queries, you can also use the SPARQL query validator, which reports for that query:
Syntax error: Lexical error at line 4, column 34. Encountered: ")" (41), after : "writer"
In Jena, you can use the ParameterizedSparqlString to avoid these sorts of issues. Here's your example, reworked to use a parameterized string:
import com.hp.hpl.jena.query.ParameterizedSparqlString;
public class PSSExample {
public static void main( String[] args ) {
// Create a parameterized SPARQL string for the particular query, and add the
// dbres prefix to it, for later use.
final ParameterizedSparqlString queryString = new ParameterizedSparqlString(
"PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>\n" +
"SELECT * WHERE\n" +
"{\n" +
" ?entity rdfs:label ?o\n" +
" FILTER (langMatches(lang(?o),\"en\"))\n" +
"}\n"
) {{
setNsPrefix( "dbres", "http://dbpedia.org/resource/" );
}};
// Entity is the same.
final String entity = "William_H._Miller_(writer)";
// Now retrieve the URI for dbres, concatentate it with entity, and use
// it as the value of ?entity in the query.
queryString.setIri( "?entity", queryString.getNsPrefixURI( "dbres" )+entity );
// Show the query.
System.out.println( queryString.toString() );
}
}
The output is:
PREFIX dbres: <http://dbpedia.org/resource/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT * WHERE
{
<http://dbpedia.org/resource/William_H._Miller_(writer)> rdfs:label ?o
FILTER (langMatches(lang(?o),"en"))
}
You can run this query at the public endpoint and get the expected results. Notice that if you use an entity that doesn't need special escaping, e.g.,
final String entity = "George_Washington";
then the query output will use the prefixed form:
PREFIX dbres: <http://dbpedia.org/resource/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT * WHERE
{
dbres:George_Washington rdfs:label ?o
FILTER (langMatches(lang(?o),"en"))
}
This is very convenient, because you don't have to do any checking about whether your suffix, i.e., entity, has any characters that need to be escaped; Jena takes care of that for you.
Related
I'm using Apache Jena to fetch a huge amount of data from Dbpedia and write it into a CSV file. However, I'm only able to get about 10,000 triples and not the entire data. I need it to fetch all triples in the query. I can't identify whether it is an endpoint timeout or something else. The code I've written is as follows:
public class FetchCountriesData {
public void getCountriesInformation() throws FileNotFoundException {
ParameterizedSparqlString qs = new ParameterizedSparqlString("PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> \n "
+ "SELECT * { ?Subject rdf:type <http://dbpedia.org/ontology/Country> . ?Subject ?Predicate ?Object } ORDER BY ?Subject ");
QueryExecution exec = QueryExecutionFactory.sparqlService("https://dbpedia.org/sparql", qs.asQuery());
//exec.setTimeout(10000000);
exec.setTimeout(10, TimeUnit.MINUTES);
ResultSet results = exec.execSelect();
ResultSetFormatter.outputAsCSV(new FileOutputStream(new File("C:/fakepath/CountryData.csv")), results);
ResultSetFormatter.out(results);
}
}
You are almost certainly hitting one of DBPedias limits. For further information see http://wiki.dbpedia.org/OnlineAccess and http://lists.w3.org/Archives/Public/public-lod/2011Aug/0028.html
I am trying to create a new object with PUT method and to add some of my own prefixes with SPARQL query. But, the object is being created without the added prefixes. It works with POST and PATCH though. Why and is there alternative way for SPARQL to use with PUT method and add using user-defined prefixes?
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX indexing: <http://fedora.info/definitions/v4/indexing#>
DELETE { }
INSERT {
<> indexing:hasIndexingTransformation "default";
rdf:type indexing:Indexable;
dc:title "title3";
dc:identifier "test:10";
}
WHERE { }
What I am saying was all the above values specified in the insert clause are not added at all.
EDIT1:
url = 'http://example.com/rest/object1'
payload = """
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX indexing: <http://fedora.info/definitions/v4/indexing#>
PREFIX custom: <http://customnamespaces/custom#/>
DELETE { }
INSERT {
<> indexing:hasIndexingTransformation "default";
rdf:type indexing:Indexable;
dc:title "title1";
custom:objState "Active";
custom:ownerId "Owner1";
dc:identifier "object1";
}
WHERE { }
"""
headers = {
'content-type': "application/sparql-update",
'cache-control': "no-cache"
}
response = requests.request("PUT", url, data=payload, headers=headers, auth=('username','password'))
Prefixes are not triples and therefore cannot be added using a SPARQL query. You can always specify prefixes in the SPARQL query and it will generate the correct URI for storage in your triple store.
Also note that your custom namespace is errantly defined by ending with both a hash and a slash. It should be either PREFIX custom: <http://customnamespaces/custom#> or PREFIX custom: <http://customnamespaces/custom/>.
I.e. by your query indexing:hasIndexingTransformation will be stored in the triple store as <http://fedora.info/definitions/v4/indexing#hasIndexingTransformation>.
There is no reason to store the prefix in the triple store (actually, prefixes are an artifact of the text serialization, not the data itself), so you can subsequently query this data in one of two ways.
1) Using a prefix
PREFIX indexing: <http://fedora.info/definitions/v4/indexing#>
SELECT ?o {
[] indexing:hasIndexingTransformation ?o .
}
2) Using the full URI:
SELECT ?o {
[] <http://fedora.info/definitions/v4/indexing#hasIndexingTransformation> ?o .
}
I am using the Jena Java framework for querying DBpedia end point using SPARQL, to get the type for all points of interest in German cities. I am facing no issue for places that have English DBpedia entries. But, when it comes to place names to be queried from the German DBpedia endpoint (http://de.dbpedia.org/resource/Schloß_Nymphenburg), this query returns no result. This problem is also mentioned over here (http://mail-archives.apache.org/mod_mbox/jena-users/201110.mbox/%3C4E877C8A.4050705#apache.org%3E). Even after referring to this, I am unable to solve the problem. I don't know how to work with QueryEngineHTTP. I am adding two code snippets - one that works (first one - query for Allianz Arena : which has an English entry in DBpedia) and one that doesn't work (second one - for Schloß Nymphenburg, that has a German entry).
This might be a very trivial issue, but I am unable to solve it. Any pointers to a solution would be very very helpful.
Thanks a lot!
Code 1 - working :
String service = "http://dbpedia.org/sparql";
final ParameterizedSparqlString query = new ParameterizedSparqlString(
"PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>" +
"PREFIX dbo: <http://dbpedia.org/ontology/>" +
"PREFIX dcterms: <http://purl.org/dc/terms/>" +
"SELECT * WHERE {" +
"?s geo:lat ?lat ." +
"?s geo:long ?long ." +
"?s dcterms:subject ?sub}");
query.setIri("?s", "http://dbpedia.org/resource/Allianz_Arena");
QueryExecution qe = QueryExecutionFactory.sparqlService(service, query.toString());
ResultSet results = qe.execSelect();
ResultSetFormatter.out(System.out, results);
Code 2 - not working :
String service = "http://dbpedia.org/sparql";
final ParameterizedSparqlString query = new ParameterizedSparqlString(
"PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>" +
"PREFIX dbo: <http://dbpedia.org/ontology/>" +
"PREFIX dcterms: <http://purl.org/dc/terms/>" +
"SELECT * WHERE {" +
"?s geo:lat ?lat ." +
"?s geo:long ?long ." +
"?s dcterms:subject ?sub}");
query.setIri("?s", "http://de.dbpedia.org/resource/Schloß_Nymphenburg");
QueryExecution qe = QueryExecutionFactory.sparqlService(service, query.toString());
ResultSet results = qe.execSelect();
ResultSetFormatter.out(System.out, results);
I don't think this is an issue with jena at all. Trying:
SELECT * WHERE {
<http://de.dbpedia.org/resource/Schloß_Nymphenburg> ?p ?o }
at http://dbpedia.org/sparql I get no results: try it yourself.
SELECT * WHERE {
<http://de.dbpedia.org/resource/Schloss_Nymphenburg> ?p ?o }
by contrast returns something, even if it's just a bunch of cross links.
I have the following Query
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?type
WHERE
{
{
SELECT *
WHERE
{
?x rdfs:subClassOf ?type .
}
}
OPTION (TRANSITIVE, t_distinct, t_in (?x), t_out (?type) ) .
FILTER (?x = <http://dbpedia.org/ontology/Hospital>)
}
It works fine when i send it to Virtuoso endpoint but does not work on my Jena instance. In specific i get the following error:
INFO [1] 400 Parse error:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?type
WHERE
{
{
SELECT *
WHERE
{
?x rdfs:subClassOf ?type .
}
}
OPTION (TRANSITIVE, t_distinct, t_in (?x), t_out (?type) ) .
FILTER (?x = <http://dbpedia.org/ontology/Hospital>)
}
Lexical error at line 12, column 39. Encountered: " " (32), after : "OPTION" (17 ms)
In case this a Virtuoso specific function, I would appreciate to know an equivalent for this query that would work with *Jena/Standard SPARQL). The expected output should be:
http://dbpedia.org/ontology/Building
http://dbpedia.org/ontology/ArchitecturalStructure
http://dbpedia.org/ontology/Place
http://dbpedia.org/ontology/d0:Location
which represents all superclasses for "Hospital"
This is the expected behavior. This part of the query:
OPTION (TRANSITIVE, t_distinct, t_in (?x), t_out (?type) )
is not standard SPARQL 1.1 but it is a Virtuoso specific extension.
Jena is a SPARQL 1.1 compliant implementation.
The following query does the same thing using standard SPARQL 1.1 syntax, and should work with both Fuseki and Virtuoso (just tested on the dbpedia endpoint and got the same result):
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?type
WHERE
{
{
SELECT *
WHERE
{
?x rdfs:subClassOf+ ?type .
}
}
FILTER (?x = <http://dbpedia.org/ontology/Hospital>)
}
The feature used is the "property path".
See http://www.w3.org/TR/sparql11-query/
I want to get the latitude and longitude of a place whose name I already know by
PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>
PREFIX dbo: <http://dbpedia.org/ontology/>
SELECT * WHERE {
?s a dbo:Place .
?s geo:lat ?lat .
?s geo:long ?long .
}
where the name of the place (?s) is something like Graves Park.
How would one go about implementing the same in Jena where the name of the place might vary?
You can use Jena's ARQ to execute queries against remote SPARQL endpoints. The process is described in ARQ — Querying Remote SPARQL Services.
Using ParameterizedSparqlStrings in SELECT queries
To do this for different places that you might not know until it is time to execute the query, you can use a ParameterizedSparqlString to hold the query and then inject the value(s) once you have them. Here's an example. The query is the one you provided. I put it into a ParameterizedSparqlString, and then used setIri to set ?s to http://dbpedia.org/resource/Mount_Monadnock.
import com.hp.hpl.jena.query.ParameterizedSparqlString;
import com.hp.hpl.jena.query.QueryExecution;
import com.hp.hpl.jena.query.QueryExecutionFactory;
import com.hp.hpl.jena.query.ResultSet;
import com.hp.hpl.jena.query.ResultSetFormatter;
public class DBPediaQuery {
public static void main( String[] args ) {
final String dbpedia = "http://dbpedia.org/sparql";
final ParameterizedSparqlString queryString
= new ParameterizedSparqlString(
"PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>"+
"PREFIX dbo: <http://dbpedia.org/ontology/>" +
"SELECT * WHERE {" +
" ?s a dbo:Place ." +
" ?s geo:lat ?lat ." +
" ?s geo:long ?long ." +
"}" );
queryString.setIri( "?s", "http://dbpedia.org/resource/Mount_Monadnock");
QueryExecution exec = QueryExecutionFactory.sparqlService( dbpedia, queryString.toString() );
ResultSet results = exec.execSelect();
ResultSetFormatter.out( System.out, results );
}
}
The results printed by this are:
--------------------------------------------------------------------------------------------------------------
| lat | long |
==============================================================================================================
| "42.8608"^^<http://www.w3.org/2001/XMLSchema#float> | "-72.1081"^^<http://www.w3.org/2001/XMLSchema#float> |
--------------------------------------------------------------------------------------------------------------
Once you have the ResultSet, you can iterate through the rows of the solution and extract the values. The values here are Literals, and from a Literal you can extract the lexical form (the string value), or the value as the corresponding Java type (in the case of numbers, strings, booleans, &c.). You could do the the following to print the latitude and longitude instead of using the ResultSetFormatter:
while ( results.hasNext() ) {
QuerySolution solution = results.next();
Literal latitude = solution.getLiteral( "?lat" );
Literal longitude = solution.getLiteral( "?long" );
String sLat = latitude.getLexicalForm();
String sLon = longitude.getLexicalForm();
float fLat = latitude.getFloat();
float fLon = longitude.getFloat();
System.out.println( "Strings: " + sLat + "," + sLon );
System.out.println( "Floats: " + fLat + "," + fLon );
}
The output after this change is:
Strings: 42.8608,-72.1081
Floats: 42.8608,-72.1081
Using ParameterizedSparqlStrings in CONSTRUCT queries
Based some of the comments, it may also be useful to use CONSTRUCT queries to save the results from each query, and to aggregate them into a larger model. Here's code that uses a construct query to retrieve the latitude and longitude of Mount Monadnock and Mount Lafayette, and stores them in a single model. (Here we're just using CONSTRUCT WHERE {…}, so the model that is returned is exactly the same as the part of the graph that matched. You can get different results by using CONSTRUCT {…} WHERE {…}.)
import com.hp.hpl.jena.query.ParameterizedSparqlString;
import com.hp.hpl.jena.query.QueryExecution;
import com.hp.hpl.jena.query.QueryExecutionFactory;
import com.hp.hpl.jena.rdf.model.Model;
import com.hp.hpl.jena.rdf.model.ModelFactory;
public class DBPediaQuery {
public static void main( String[] args ) {
final String dbpedia = "http://dbpedia.org/sparql";
final ParameterizedSparqlString queryString
= new ParameterizedSparqlString(
"PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>"+
"PREFIX dbo: <http://dbpedia.org/ontology/>" +
"CONSTRUCT WHERE {" +
" ?s a dbo:Place ." +
" ?s geo:lat ?lat ." +
" ?s geo:long ?long ." +
"}" );
Model allResults = ModelFactory.createDefaultModel();
for ( String mountain : new String[] { "Mount_Monadnock", "Mount_Lafayette" } ) {
queryString.setIri( "?s", "http://dbpedia.org/resource/" + mountain );
QueryExecution exec = QueryExecutionFactory.sparqlService( dbpedia, queryString.toString() );
Model results = exec.execConstruct();
allResults.add( results );
}
allResults.setNsPrefix( "geo", "http://www.w3.org/2003/01/geo/wgs84_pos#" );
allResults.setNsPrefix( "dbo", "http://dbpedia.org/ontology/" );
allResults.setNsPrefix( "dbr", "http://dbpedia.org/resource/" );
allResults.write( System.out, "N3" );
}
}
The output shows triples from both queries:
#prefix dbr: <http://dbpedia.org/resource/> .
#prefix geo: <http://www.w3.org/2003/01/geo/wgs84_pos#> .
#prefix dbo: <http://dbpedia.org/ontology/> .
dbr:Mount_Lafayette
a dbo:Place ;
geo:lat "44.1607"^^<http://www.w3.org/2001/XMLSchema#float> ;
geo:long "-71.6444"^^<http://www.w3.org/2001/XMLSchema#float> .
dbr:Mount_Monadnock
a dbo:Place ;
geo:lat "42.8608"^^<http://www.w3.org/2001/XMLSchema#float> ;
geo:long "-72.1081"^^<http://www.w3.org/2001/XMLSchema#float> .