I'm trying to build a query to fetch instances of / any subclasses of abstract elements such as "human" (Q5) by name, however the query fails with a timeout, probably because it has too many nodes to traverse in the graph.
Are there any better methods to query this? The best I could come up with is using the Wikidata API search entities endpoint with the element name, then filter the desired results in Sparql query to minimize the domain of the query instead of the whole graph.
I'm a little worried about using this method in a production environment since Wikidata Sparql is in Beta. Any best practices for migrating knowledge graph use cases from freebase? Is there any update regarding the migration of data from Freebase to Wikidata?
Finally are there any other mature alternatives to the deprecated Freebase service?
What endpoint are you querying against? Querying against a shared public endpoint with no SLA (beta or not) for a production service is very risky proposition.
Wikidata offers full database dumps that you can tailor/subset and load into whatever infrastructure you like. That would give you complete control over performance, quality, and any other metrics which are important to you.
As far as migrating from Freebase goes, there is no migration path. The track that train was on has come to an end (at least for external non-Google users). It's not just deprecated, it was shut down completely a while ago. A tiny fraction of the data was imported to Wikidata (and they shared a bunch in common already due to their common ancestor Wikipedia), but none of the programmatic features such as MQL's JSON query-by-example, Freebase Search, Freebase Suggest, Google-scale performance or availability, etc is available (yet?) for Wikidata.
If the data is important to you, you should self-host using whatever infrastructure meets your needs.
Related
So, I need to run SPARQL query over a semantic database but some of the triples are not going to be in the database but are going to be provided by webservices (and not as a SPARQL endpoint). I would want to be able to run a SELECT query that take into consideration those additional triples but without having to insert them in the database, is there a way to do that ?
This is not part of the SPARQL spec, so "no" is the general answer.
That said, Virtuoso (possibly among others) lets you include an external RDF source (a/k/a webservice) as part of the FROM (among other methods), to be dereferenced during SPARQL query processing.
Such webservice need not be a SPARQL endpoint, but best performance will result if it provides RDF (though serialization may vary).
The Virtuoso Sponger can also be invoked on the fly to derive RDF from many document formats (with an obvious performance hit). To pursue, please raise this to the OpenLink Community Forum.
We are going to have the semantic web. Now we have LOD cloud.
Every data set has its own SPARQL endpoint.
I can query the dataset triples.
How can I query the whole semantic web or LOD?
No, there is no such single SPARQL endpoint, because the Semantic Web is decentralized by design. However, SPARQL 1.1 supports federated queries over different SPARQL endpoints using the SERVICE keyword. See https://www.w3.org/TR/sparql11-federated-query/ for reference. More specifically, there is a mention in the literature about how to determine which data sources might be relevant for query answering at Internet scale:
Hartig O., Bizer C., Freytag J.C. (2009) Executing SPARQL queries over the Web of Linked Data. In: Bernstein A. et al. (eds.) The Semantic Web – ISWC 2009. ISWC 2009. Lecture Notes in Computer Science, vol. 5823, pp. 293–309. Heidelberg: Springer. doi: 10.1007/978-3-642-04930-9_19
There exists a W3C-owned and (un-?)maintained wiki page with ~60 SPARQL endpoints. Many "last accessed/checked" entries are from 2010. On that page is a link to http://sparqles.ai.wu.ac.at/availability which lists more endpoints and is much more recent and up-to-date.
Read the 2nd paragraphs titled "SPARQL Endpoints" of the blogpost Querying DBpedia with GraphQL for a skeptical view of the state of SPARQL today. Cannot say it any better myself.
Also note that SPARQL permits every endpoint to offer any number of "named GRAPH" constructs that can be queried at that endpoint. So that is another feature more to consider.
There is no central point regarding the notion of a Semantic Web of Linked Data. Instead, like any Super Information Highway, you have major concentration points (hubs or junctions) that enable you to discover routes to a variety of destinations.
Major Semantic Web of Linked Data hubs that we oversee at OpenLink Software include:
DBpedia
DBpedia-Live
URIBurner
LOD Cloud Cache
Remember, the fundamental principle behind Linked Open Data is that hyperlinks (HTTP URIs) function as words in sentences constructed using RDF Language. Thus, you can use the SPARQL Query Language to produce query solutions (tables or graphs) that expose desired routes (e.g., using Property Paths).
Finally, you can also use Federated SPARQL Query (SPARQL-FED) to navigate a Semantic Web of Linked Data.
Examples:
select distinct *
where {
?s a <http://dbpedia.org/ontology/AcademicJournal> ;
rdf:type{1,3} ?o
}
LIMIT 50
Query Solution Document Link.
We are also working on a publicly available Google Spreadsheet that provides additional information related to the kinds of datasets accessible via the LOD Cloud that we maintain.
To my knowledge LOD-a-lot is currently the one ongoing effort that gets closest to the vision of querying the whole web of data. And this is obviously done using different means than SPARQL endpoints.
It's still a prototype, which means bugs, but one of the aims of wimuQ is to provide a way to query all 539 public SPARQL endpoints + all datasets from LODLaundromat and LODStats, that is more than 600,000 datasets, more than 5 terabytes. As far as I know, it is the most extensive collection datasets accessible from one single place.
For more information, the paper is available here:
I've been trying to figure out how to mount a SPARQL endpoint for a couple of days, but as much as I read I can not understand it.
Comment my intention: I have an open data server mounted on CKAN and my goal is to be able to use SPARQL queries on the data. I know I could not do it directly on the datasets themselves, and I would have to define my own OWL and convert the data I want to use from CSV format (which is the format they are currently in) to RDF triple format (to be used as linked data).
The idea was to first test with the metadata of the repositories that can be generated automatically with the extension ckanext-dcat, but is that I really do not find where to start. I've searched for information on how to install a Virtuoso server for the SPARQL, but the information I've found leaves a lot to be desired, not to say that I can find nowhere to explain how I could actually introduce my own OWLs and RDFs into Virtuoso itself.
Someone who can lend me a hand to know how to start? Thank you
I'm a little confused. Maybe this is two or more questions?
1. How to convert tabular data, like CSV, into the RDF semantic format?
This can be done with an R2RML approach. Karma is a great GUI for that purpose. Like you say, a conversion like that can really be improved with an underlying OWL ontology. But it can be done without creating a custom ontology, too.
I have elaborated on this in the answer to another question.
2. Now that I have some RDF formatted data, how can I expose it with a SPARQL endpoint?
Virtuoso is a reasonable choice. There are multiple ways to deploy it and multiple ways to load the data, and therefore LOTs of tutorial on the subject. Here's one good one, from DBpedia.
If you'd like a simpler path to starting an RDF triplestore with a SPARQL endpoint, Stardog and Blazegraph are available as JARs, and RDF4J can easily be deployed within a container like Tomcat.
All provide web-based graphical interfaces for loading data and running queries, in addition to SPARQL REST endpoints. At least Stardog also provides command-line tools for bulk loading.
i'm new to semantic web.
I'm trying to do a sample application where i can query data from different data sources in one query.
i have created a small rdf file which contains references to dbpedia resources for defining localities. my question is : how can i get the data contained in my file and other information which is in the description of the distant resource (for example : the name of the person from the local file, and the total poulation in a city dbpedia-owl:populationTotal from the distant rdf file).
i don't really understand the sparql query language, i tried to use the JENA ARQ API with the SERVICE keyword but it doesn't solve the problem.
Any help please?
I guess you are looking for something like the Semantic Web Client Library, which tries to leverage the GGG. Albeit, the standard exploration algorithm of this framework is that it follows rdfs:seeAlso links. Nevertheless, the general approach seems to be what your are looking for, i.e., you would create a local graph that consists of your seed graph and that traverse the relations up to a certain level, e.g., three steps, resolves the URIs and load that content into your local triple. Utilising advanced technologies like SPARQL federation might be something for later ;)
I have retrived data from two different sources using SPARQL query with named graphs.
I used jena-ARQ to execute the sparql query.
currently, I found out that I can query using model (Model) syntax in Jena in a rdf after loading the model from a file, it gives me same output if I apply a sparql query. So, I want to know that , is it a good way to that without sparql? Though I have tested it with a small rdf file. I also want to know if I use Virtuoso can i manipulate using model syntax without sparql?
Thanks in Advance.
I'm not quite sure if I understand your question. If I can paraphrase, I think you're asking:
Is it OK to query and manipulate RDF data using the Jena Model API instead of using
SPARQL? Does it make a difference if the back-end store is Virtuoso?
Assuming that's the right re-phrasing of the question, then the first part is definitively yes: you can manipulate RDF data through the Model and OntModel APIs. In fact, I would say that's what the majority of Jena users do, particularly for small queries or updates. I find personally that going direct to the API is more succinct up to a certain point of complexity; after that, my code is clearer and more concise if I express the query in SPARQL. Obviously circumstances will have an effect: if you're working with a mixture of local stores and remote SPARQL endpoints (for which sending a query string is your only option) then you may find the consistency of always using SPARQL makes your code clearer.
Regarding Virtuoso, I don't have any direct experience to offer. As far as I know, the Virtuoso Jena Provider fully implements the features of the Model API using a Virtuoso store as the storage layer. Whether the direct API or using SPARQL queries gives you a performance advantage is something you should measure by benchmark with your data and your typical query patterns.