SPARQL Querying Data Linked Between Two Separate Graphs - sparql

I have a graph of Persons stored in a StarDog instance, and a graph of Addresses in a Fuseki instance. One of the Person instances has a hasAddress relationship with an Address.
I wish to make a SPARQL query that simply returns all the Persons who have an address, as well as the address at which they live. This query is going to sent to the StarDog instance (the endpoint with the Person graph). The query I am using is
prefix testOnt: <http://www.example.org/test-ontology#>
SELECT ?person ?address
WHERE {
?person testOnt:hasAddress ?address
}
However, this returns no results.
I am initialising the StarDog database using this bit of TTL:
:SomeDude rdf:type owl:NamedIndividual , :Person;
:hasAddress :Address1.
Where Address1 is defined in the TDB on the Fuseki side.
I just don't know how I'm meant to reference the second graph when I make a query to the other.
Thanks for any help, and I can clarify any points in the comments.

Your query to retrieve just ?person testOnt:hasAddress ?address should work against Stardog but would not return address information stored in Fuseki. If this query is not working then make sure the namespaces you use in the data and the query match.
In order to get people instances from Stardog and their addresses from Fuseki you will need a federated SPARQL query that uses the SERVICE keyword. Assuming namespaces are correct, your query should look like this:
prefix testOnt: <http://www.example.org/test-ontology#>
SELECT ?person ?address
WHERE {
?person testOnt:hasAddress ?address
SERVICE <http://fuseki/location> {
?address ?p ?o # use a more specific BGP here if necessary
}
}

Related

Why different endpoints do not query same datasets?

I would like to query datasets such as FOAF and DBPedia. The aim is to run quite simple requests such as “Which paintings did Magritte painted ?”,”Which are the American actor who played in American movies ?” …
So I wrote my queries, and used DBpedia snorql to run them. Then, for some other reasons, I tried Live DBpedia and OpenLinks demo.openlinksw.com to discover that the results were different according to the endpoint.
Here are 2 examples :
1) Answer with DBpedia SnorQL but neither Live DBpedia nor OpenLinks demo.openlinksw.com
#works of Magritte
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX dbp: <http://dbpedia.org/property/>
SELECT * WHERE {
?person a dbo:Artist .
?person foaf:surname "Magritte"#en .
?work dbo:author ?person .
OPTIONAL {?work dbp:year ?year ; dbo:museum ?museum .}
}
ORDER BY ?year
2) Answer with Live DBpedia but neither DBpedia SnorQL nor OpenLinks demo.openlinksw.com
#american actors from Willem Robert van Hage R tutorial
SELECT ?actor ?movie ?director ?movie_date
WHERE {
?m dc:subject <http://dbpedia.org/resource/Category:American_films> .
?m rdfs:label ?movie .
FILTER(LANG(?movie) = "en")
?m dbp:released ?movie_date .
FILTER(DATATYPE(?movie_date) = xsd:date)
?m dbp:starring ?a .
?a rdfs:label ?actor .
FILTER(LANG(?actor) = "en")
?m dbp:director ?d .
?d rdfs:label ?director .
FILTER(LANG(?director) = "en")
}
LIMIT 1000
I thought an endpoint was simply a tool to query dataset whatever it is. So I thought that you can query DBPedia and FOAF from dbpedia, live dbpedia or openlinks demo.openlinksw.com ..
I read that actually different endpoints use different datasets but I can’t get why, as you give specific URI to reach.
Why do same query returns different results according to the SPARQL endpoint ?
Much like different instances of SQL DBMS (such as [in no particular order and without implication of endorsement] OpenLink Virtuoso, Oracle, MySQL, Informix, SQL Server, Sybase, DB2, PostgreSQL, Ingres, Progress OpenEdge, and many others) hold different data, different instances (read: SPARQL endpoints) of RDF RDBMS, also known as Quadstores or Triplestores (such as [in no particular order and without implication of endorsement] OpenLink Virtuoso, AllegroGraph, Stardog, Neo4J, MarkLogic, and many others) hold different data.
You cannot query Joe's database in DBMS A through Fred's database in DBMS B -- unless someone has already told Fred's database and/or DBMS about Joe's database and/or DBMS (e.g., VDBMS functionality), or you include some information about Joe's database and/or DBMS in your query (e.g., SPARQL Federation), etc.
(A "DBMS" is a Database Management System, such as listed above. A "database" is a collection of data, typically stored in a [large] document, which is managed by a DBMS.)
Of particular note relative to your question --
FOAF is an ontology, a vocabulary, which is used to describe entities.
DBpedia is a dataset (which has had various versions over time), and a project, and an organization, and various other things (the ambiguity of literal identifiers!).
OpenLink Software (not "openlinks") is a company which produces OpenLink Virtuoso, among other data-related software and services, and which provides a number of live endpoints on the web -- including the main DBpedia endpoint. (ObDisclaimer: OpenLink Software is also my employer.)

Is there a better way to do case insensitive queries? [duplicate]

I am using Jena ARQ to write a SPARQL query against a large ontology being read from Jena TDB in order to find the types associated with concepts based on rdfs label:
SELECT DISTINCT ?type WHERE {
?x <http://www.w3.org/2000/01/rdf-schema#label> "aspirin" .
?x <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type .
}
This works pretty well and is actually quite speedy (<1 second). Unfortunately, for some terms, I need to perform this query in a case-insensitive way. For instance, because the label "Tylenol" is in the ontology, but not "tylenol", the following query comes up empty:
SELECT DISTINCT ?type WHERE {
?x <http://www.w3.org/2000/01/rdf-schema#label> "tylenol" .
?x <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type .
}
I can write a case-insensitive version of this query using FILTER syntax like so:
SELECT DISTINCT ?type WHERE {
?x <http://www.w3.org/2000/01/rdf-schema#label> ?term .
?x <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type .
FILTER ( regex (str(?term), "tylenol", "i") )
}
But now the query takes over a minute to complete! Is there any way to write the case-insensitive query in a more efficient manner?
From all the the possible string operators that you can use in SPARQL, regex is probably the most expensive one. Your query might run faster if you avoid regex and you use UCASE or LCASE on both sides of the test instead. Something like:
SELECT DISTINCT ?type WHERE {
?x <http://www.w3.org/2000/01/rdf-schema#label> ?term .
?x <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type .
FILTER (lcase(str(?term)) = "tylenol")
}
This might be faster but in general do not expect great performance for text search with any triple store. Triple stores are very good at graph matching and not so good at string matching.
The reason the query with the FILTER query runs slower is because ?term is unbound it requires scanning the PSO or POS index to find all statements with the rdfs:label predicate and filter them against the regex. When it was bound to a concrete resource (in your first example), it could use a OPS or POS index to scan over only statements with the rdfs:label predicate and the specified object resource, which would have a much lower cardinality.
The common solution to this type of text searching problem is to use an external text index. In this case, Jena provides a free text index called LARQ, which uses Lucene to perform the search and joins the results with the rest of the query.

Mapping DBpedia types to Wikipedia Categories

I am trying to map DBPedia types to Wikipedia Categories, a simple example would be the following SPARQL query
select distinct ?cat where {
?s a dbpedia-owl:LacrossePlayer; dcterms:subject ?cat . filter(regex(?cat,'players','i') )
} limit 100
SPARQL Result
But this is highly inefficient as it has to first map the DBpedia types to DBpedia Named Entities(resources) and then extract their corresponding Wikipedia categories. I am trying to do this mapping for a lot of other DBpedia types.
Is there a direct or more efficient way to do this?
Improving the filter may help…
As an initial note, you may get some speedup if you remove or improve your filter. You can, of course, just remove it, but you could also make it more efficienct, since you're not really using any special regular expressions. Just do
filter contains(lcase(str(?cat)),'players')
to check whether the URI for ?cat contains the string players. It might even be better (I'm not sure) to grab the English rdfs:label of ?cat and check that, since you wouldn't have to do the case or string conversions.
… but there are lots of results.
But this is highly inefficient as it has to first map the DBpedia
types to DBpedia Named Entities(resources) and then extract their
corresponding Wikipedia categories. I am trying to do this mapping for
a lot of other DBpedia types. Is there a direct or more efficient way
to do this?
I'm not sure exactly what's inefficient in this. The only way that DBpedia types and categories are associated is that resources have types (via rdf:type) and have categories (via dcterms:subject). If you want to find the connections, then you'll need to find the instances of the type and the categories to which they belong. There may be some possibility that you can look into whether any particular infoboxes provide categories to articles and are used in the infobox mapping to provide DBpedia types. That's the only way to get category/DBpedia-types directly, without going through instances that I can think of, and I don't know whether the current dataset has that kind of information.
In general, since Wikipedia categories are not a type hierarchy, there will be lots of categories with which instances of any particular type are associated. For instance, we can count the number of categories associated with the types Fish and LacrossePlayer with a query like this:
select ?type (count(distinct ?category) as ?nCategories) where {
values ?type { dbpedia-owl:Fish dbpedia-owl:LacrossePlayer }
?type ^a/dcterms:subject ?category
}
group by ?type
SPARQL results
type nCategories
http://dbpedia.org/ontology/LacrossePlayer 346
http://dbpedia.org/ontology/Fish 2375
That query responds pretty quickly, and you can even get those categories pretty easily, too:
select distinct ?type ?category where {
values ?type { dbpedia-owl:Fish dbpedia-owl:LacrossePlayer }
?type ^a/dcterms:subject ?category
}
order by ?type
limit 4000
SPARQL results
When you start using types that have many more instances, though, these counts get big, and the queries take a while to return. E.g., a very common type like Place:
select ?type (count(distinct ?category) as ?nCategories) where {
values ?type { dbpedia-owl:Place }
?type ^a/dcterms:subject ?category
}
group by ?type
type nCategories
http://dbpedia.org/ontology/Place 191172
I wouldn't suggest trying to pull all that data down from the remote server. If you want to extract it, you should load the data locally.

How to list all different properties in DBPedia

I have a burning question concerning DBpedia. Namely, I was wondering how I could search for all the properties in DBpedia per page. The URI http://nl.dbpedia.org/property/einde concerns the property "einde". I would like to get all existing property/ pages. This does not seem too hard, but I don't know anything about SPARQL, so that's why I want to ask for some help. Perhaps there is some kind of dump of it, but I honestly don't know.
Rather than asking for pages whose URLs begin with, e.g., http://nl.dbpedia.org/property/, we can express the query by asking “for which values of ?x is there a triple ?x rdf:type rdf:Property in DBpedia?” This is a pretty simple SPARQL query to write. Because I expected that there would be lots of properties in DBPedia, I first wrote a query to count how many there are, and afterward wrote a query to actually list them.
There are 48292 things in DBpedia declared to be of rdf:type rdf:Property, as reported by this SPARQL query, run against one of DBpedia's SPARQL endpoints:
select COUNT( ?property ) where {
?property a rdf:Property
}
SPARQL Results
You can get the list by selecting ?property instead of COUNT( ?property ):
select ?property where {
?property a rdf:Property
}
SPARQL Results
I second Joshua Taylor's answer, however if you want to limit the properties to the Dutch DBpedia, you need to change the default-graph-uri query parameter to nl.dbpedia.org and set the SPARQL endpoint to nl.dbpedia.org/sparql, as in the following query. You will get a result-set of just above 8000 elements.
SELECT DISTINCT ?pred WHERE {
?pred a rdf:Property
}
ORDER BY ?pred
run query
These are the Dutch translations of the properties that have been mapped from Wikipedia so far. The full English list is also available. According to mappings.dbpedia.org, there are ~1700 properties with missing Dutch translations.

Query dbpedia with SPARQL for films

I want to get data (movie title, director name, actor name and the wikipedia link) of all movies present on dbpedia.
I tried this query on http://dbpedia.org/snorql/.
SELECT ?film_title ?star_name ?nameDirector ?link WHERE {
{
SELECT DISTINCT ?movies ?film_title
WHERE {
?movies rdf:type <http://dbpedia.org/ontology/Film>;
rdfs:label ?film_title.
}
}.
?movies dbpedia-owl:starring ?star;
foaf:isPrimaryTopicOf ?link;
dbpedia-owl:director ?director.
?director foaf:name ?nameDirector.
?star foaf:name ?star_name.
FILTER LANGMATCHES( LANG(?film_title), 'en')
} LIMIT 100
Responses seems correct, but the response time are slow, so I'm wondering if I can improve my query for get a faster response.
There are a couple of things you could change in your query that might make it faster.
Firstly what is the point of your SELECT DISTINCT subquery? Is that merely trying to eliminate duplicate film titles? Removing this may make things faster if you can live with a few duplicates.
Secondly the FILTER clauses requires the database to scan over all the possible matches and evaluate the expression on each possible match to determine whether to keep it or throw it away. Again if you can live with getting some duplicate data and don't mind non-English language tags removing the FILTER may make the query run faster.