Unable to use SPARQL SERVICE with FactForge - sparql

I am trying to access FactForge from Sesame triplestore. This is the query:
select *
where{
SERVICE <http://factforge.net/sparql>{
?s ?p ?o
}
}
LIMIT 100
The query doesn't get executed. The same structure works with DBpedia. FactForge's SPARQL endpoint on the web is working. What do I need to do to access the endpoint successfully from Sesame?

What you need to do is write a more meaningful (or at least more constrained) query. Your query is just selecting all possible triples, which presumably puts a lot of stress on the factforge endpoint (which contains about 3 billion triples). The reason your query "does not get executed" (which probably means that you are just waiting forever for the query to return a result) is that it takes the SPARQL endpoint a very long time to return its response.
The LIMIT 100 you put on the query is outside the scope of the SERVICE clause, and therefore not actually communicated to the remote endpoint you're querying. While in this particular case it would be possible for Sesame's optimizer to add that (since there are no additional constraints in your query outside the scope of the SERVICE clause), unfortunately it's currently not that smart - so the query sent to factforge is without a limit, and the actual limit is only applied after you get back the result (which, in the case of your "give me all your triples" query, naturally takes a while).
However, demonstrably the SERVICE clause does work for FactForge when used from Sesame, because if you try a slightly more constrained query, for example a query selecting all companies:
PREFIX dbp-ont: <http://dbpedia.org/ontology/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
select *
where{
SERVICE <http://factforge.net/sparql>{
?s a dbp-ont:Company
}
} LIMIT 100
it works fine, and you get a response.
More generally speaking, I should recommend that if you want to do queries that are specifically to a particular SPARQL endpoint, you should use a SPARQL endpoint proxy (which is one of the repository types available in Sesame) instead of using the SERVICE clause. SERVICE is only really useful when trying to combine data from your local repository with that of a remote endpoint in a single query. Using a SPARQL endpoint proxy allows you to make sure LIMIT clauses are actually communicated to the endpoint, and just generally will give you better performance than a SERVICE query will.

Related

Realizing and Retrieving information from a producer-consumer OWL model

I have the following scenario modelled in OWL:
Producer ----producesResource---> Resource <------consumesResource ---- Consumer
Producer, Resource and Consumer are OWL Classes, while producesResource and consumesResource are object properties. The scenario is quite intuitive in that that each producer produces one or more resources that is consumed by one or more consumers. Conversely, each consumer can consume one or more resources. The ontology is populated with instances / individuals accordingly.
I would like to check if there exists a resource that is consumed by a consumer that is not produced by a Producer. What is an elegant way to get this information via a:
Query in SPARQL
SHACL Shapes graph (if possible).
Negations are possible in SPARQL using the NOT BOUND filter or more easily in SPARQL 1.1 using MINUS:
SELECT ?resource WHERE
{
?resource a :Resource.
?consumer a :Consumer;
?consumer :consumesResource ?resource.
MINUS {?producer a :Producer; :producesResource ?resource.}
}
You can also use ASK to get a boolean result but SELECT allows easier debugging to verify if your query is working correctly.
As SHACL allows integrating SPARQL queries, this answers your second question too but in that case it is easier to just use the SPARQL query on its own.

Wikidata Virtuoso SPARQL Endpoint - How to get more than 100,000 results

I need to get Wikidata artifacts (instance-types, redirects and disambiguations) for a project.
As the original Wikidata endpoint has time constraints when it comes to querying, I have come across Virtuoso Wikidata endpoint.
The problem I have is that if I try to get for example the redirects with this query, it only returns 100,000 results at most:
PREFIX owl: http://www.w3.org/2002/07/owl#
CONSTRUCT {?resource owl:sameAs ?resource2}
WHERE
{
?resource owl:sameAs ?resource2
}
Iā€™m writing to ask if you know of any way to get more than 100,000 results. I would like to be able to achieve the maximum number of possible results.
Once the results are obtained, I must have 3 files (or as few files as possible) in the Ntriples format: wikidata_intance_types.nt, wikidata_redirecions.nt and wikidata_disambiguations.nt.
Thank you very much in advance.
All the best,
Jose Manuel
Please recognize that in both cases (Wikidata itself, and the Virtuoso instance provided by OpenLink Software, my employer), you are querying against a shared resource, and various limits should be expected.
You should space your queries out over time, and consider smaller chunks than the 100,000 limit you've run into -- perhaps 50,000 at a time, waiting for each query to finish retrieving results, plus another second or ten, before issuing the next query.
Most of the guidance in this article about working with the DBpedia public SPARQL endpoint is relevant for any public SPARQL endpoint, especially those powered by Virtuoso. Specific settings on other endpoints will vary, but if you try to be friendly ā€” by limiting the rate of your queries; limiting the size of partial result sets when using ORDER BY, LIMIT, and OFFSET to step through to get a full result set for a query that overflows the instance's maximum result set size; and the like ā€” you'll be far more successful.
You can get and host your own copy of wikidata as explained in
https://wiki.bitplan.com/index.php/Get_your_own_copy_of_WikiData
There are also alternatives to get a partial dump of wikidata e.g. with https://github.com/bennofs/wdumper
Or ask for access to one of the non public copies we run by sending me a personal e-mail via my RWTH Aachen i5 account

No Data Available When Query on Apache Jena Fuseki

I'm new with Apache Jena Fuseki and SparQL. I have a problem when I tried to query data on Fuseki. The data I used is from DBpedia named 'Topical Concepts' (can be found here). I upload the data through the control panel on a browser (through the default port 3030) and used the query below:
SELECT ?subject ?predicate ?object
WHERE {
?subject ?predicate ?subject
}
LIMIT 25
I got a null table and a message "no data available in this table". However, when I installed Fuseki and do the same thing on my Mac (the problem in above happened on my desktop with Ubuntu 16 operation system), I successfully got 25 entries of the data. I don't think it is the problem of the operation system, but I really have no idea about why it happened. Does anybody encounter the same problem?
In your SPARQL query, you have the following pattern:
?subject ?predicate ?subject
Notice that you repeat ?subject. This query effectively asks: "give me all RDF triples for which the subject is the same value as the object". It's likely that the reason you're not getting a result is simply that no such triples exist in your database.
As for why this didn't happen on a Mac, without more info about your setup we can only speculate. It's possible that you configured your database slightly differently there (e.g. enabling a reasoner which would result in additional RDF triples that do match the query), or it might be as simple as you did a slightly different query there.
I am making two assumptions for answering your question:
you have two different instances of Jena installed. One on your laptop and one on your desktop.
You are sure you have uploaded data, possibly into a named graph and the default is empty
Fuseki, I haven't tried this on TBD, has one feature that, often, by default is set to query only the default graph. If in the config setting you activate tdb:unionDefaultGraph true ; then it will query all the graphs. Before changing the settings, please do check that this is true. You could check by executing this query:
SELECT distinct ?g
WHERE {
graph ?g{
?s ?p ?o
}
}
If you get a result, that means you need to change the settings for it to work, or be mindful of this fact and always call your queries with graphs (as in the above query).
For more explanation please refer to https://jena.apache.org/documentation/serving_data/

Filter by datatype

I'm trying to filter by datatype in DBpedia. For example:
SELECT *
WHERE {?s ?p ?o .
FILTER ( datatype(?o) = xsd:integer)
}
LIMIT 10
But I get no results, while there are certainly integer values. I get the same from other endpoints using Virtuoso, but I do get results from alternative endpoints. What could be the problem? If Virtuoso does not implement this SPARQL function properly, what to use instead?
This is an expensive query, since it has to traverse all triples (?s ?p ?o .). The query's execution time exceeds the maximum time configured for the Virtuoso instance that serves DBpedia's SPARQL endpoint at http://dbpedia.org/sparql.
If you don't use the timeout parameter, then you will get a time out error (Virtuoso S1T00 Error SR171: Transaction timed out). When you use the timeout (by default set to 30000 for the DBpedia endpoint), you will get incomplete results that will contain HTTP headers like these:
X-SQL-State: S1TAT
X-SQL-Message: RC...: Returning incomplete results, query interrupted by result timeout. Activity: 1.389M rnd 5.146M seq 0 same seg 55.39K same pg 195 same par 0 disk 0 spec disk 0B / 0 me
Empty results thus may be incomplete and it doesn't need to indicate there are no xsd:integer literals in DBpedia. A related discussion about the partial results in Virtuoso can be found here.
As a solution to your query, you can load the DBpedia from dumps and analyze it locally.
As a side note, your query is syntactically invalid, because it's missing namespace for the xsd prefix. You can check the syntax of SPARQL queries via SPARQLer Query Validator. You can find the namespaces for common prefixes using Prefix.cc. Virtuoso that provides the DBpedia SPARQL endpoint ignores the missing namespace for the xsd prefix, but it's a good practice to stick to syntactically valid SPARQL queries for greater interoperability.

DBPedia queries missing certain chemical compounds

I am running this query to get list of all compounds from the DBPedia public SPARQL endpoint.
SELECT * WHERE {
?y rdf:type dbpedia-owl:Drug.
?y rdfs:label ?Name .
OPTIONAL {?y dbpedia-owl:iupacName ?iupacname} .
OPTIONAL {?y dcterms:subject ?y1}
FILTER (langMatches(lang(?Name),"en"))
}
LIMIT 50000
I am downloading in batches of 50000 (2 files) using offset parameter.
Somehow Isopropyl_alcohol is not getting covered in this even where page exists at
http://live.dbpedia.org/page/Isopropyl_alcohol
and it has the properties that I am searching for?
There are two issues here. The first is that DBpedia Live and DBpedia do not have exactly the same content. According to the DBpedia live webpage
Wikipedia users constantly revise Wikipedia articles with updates
happening almost each second. Hence, data stored in the official
DBpedia endpoint can quickly become outdated, and Wikipedia articles
need to be re-extracted. DBpedia Live enables such a continuous
synchronization between DBpedia and Wikipedia.
That page also lists two SPARQL endpoints for DBpedia Live:
http://live.dbpedia.org/sparql
http://dbpedia-live.openlinksw.com/sparql
However, you'll run into issues on both. Isopropyl_alcohol is in DBpedia, and its URI is
http://dbpedia.org/resource/Isopropyl_alcohol (which, when using a web browser, redirects to
http://dbpedia.org/page/Isopropyl_alcohol)
Looking there, we see that Isopropyl alcohol doesn't have rdf:type dbpedia-owl:Drug, but only
owl:Thing
dbpedia-owl:ChemicalCompound
dbpedia-owl:ChemicalSubstance
yago:Alcohols
http://umbel.org/umbel/rc/ChemicalSubstanceType
yago:AlcoholSolvents
so you won't be able to find it with your query on DBpedia, because it doesn't have the type `dbpedia-owl:Drug. Now, Isopropyl_alcohol also exists in DBpedia live, and its URL is
http://live.dbpedia.org/resource/Isopropyl_alcohol, which redirects to
http://live.dbpedia.org/page/Isopropyl_alcohol
but it only has the folllowing rdf:types:
yago:Alcohols
yago:AlcoholSolvents
http://umbel.org/umbel/rc/ChemicalSubstanceType
so it won't be found by your query on DBpedia Live, for the same reason.
The second issue is the one that AndyS pointed out. Even if the query would select Isopropyl_alcohol in DBpedia or DBpedia Live, unless you provide an ordering constraint, the limit/offset combination won't be guaranteed to return it, since without an ordering constraint, the server could legitimately return the same set of 50000 results to you every time.
Maybe it is not finding it in the LIMIT/OFFSET combination you are using. The server is not obliged to answer queries in the same order everytime unless you use ORDER BY so maybe the slices you have are not in fact all results.
Maybe the SPARQL site and live.dbpedia are not in step.
Try asking directly for Isopropyl_alcohol.