We are going to have the semantic web. Now we have LOD cloud.
Every data set has its own SPARQL endpoint.
I can query the dataset triples.
How can I query the whole semantic web or LOD?
No, there is no such single SPARQL endpoint, because the Semantic Web is decentralized by design. However, SPARQL 1.1 supports federated queries over different SPARQL endpoints using the SERVICE keyword. See https://www.w3.org/TR/sparql11-federated-query/ for reference. More specifically, there is a mention in the literature about how to determine which data sources might be relevant for query answering at Internet scale:
Hartig O., Bizer C., Freytag J.C. (2009) Executing SPARQL queries over the Web of Linked Data. In: Bernstein A. et al. (eds.) The Semantic Web – ISWC 2009. ISWC 2009. Lecture Notes in Computer Science, vol. 5823, pp. 293–309. Heidelberg: Springer. doi: 10.1007/978-3-642-04930-9_19
There exists a W3C-owned and (un-?)maintained wiki page with ~60 SPARQL endpoints. Many "last accessed/checked" entries are from 2010. On that page is a link to http://sparqles.ai.wu.ac.at/availability which lists more endpoints and is much more recent and up-to-date.
Read the 2nd paragraphs titled "SPARQL Endpoints" of the blogpost Querying DBpedia with GraphQL for a skeptical view of the state of SPARQL today. Cannot say it any better myself.
Also note that SPARQL permits every endpoint to offer any number of "named GRAPH" constructs that can be queried at that endpoint. So that is another feature more to consider.
There is no central point regarding the notion of a Semantic Web of Linked Data. Instead, like any Super Information Highway, you have major concentration points (hubs or junctions) that enable you to discover routes to a variety of destinations.
Major Semantic Web of Linked Data hubs that we oversee at OpenLink Software include:
DBpedia
DBpedia-Live
URIBurner
LOD Cloud Cache
Remember, the fundamental principle behind Linked Open Data is that hyperlinks (HTTP URIs) function as words in sentences constructed using RDF Language. Thus, you can use the SPARQL Query Language to produce query solutions (tables or graphs) that expose desired routes (e.g., using Property Paths).
Finally, you can also use Federated SPARQL Query (SPARQL-FED) to navigate a Semantic Web of Linked Data.
Examples:
select distinct *
where {
?s a <http://dbpedia.org/ontology/AcademicJournal> ;
rdf:type{1,3} ?o
}
LIMIT 50
Query Solution Document Link.
We are also working on a publicly available Google Spreadsheet that provides additional information related to the kinds of datasets accessible via the LOD Cloud that we maintain.
To my knowledge LOD-a-lot is currently the one ongoing effort that gets closest to the vision of querying the whole web of data. And this is obviously done using different means than SPARQL endpoints.
It's still a prototype, which means bugs, but one of the aims of wimuQ is to provide a way to query all 539 public SPARQL endpoints + all datasets from LODLaundromat and LODStats, that is more than 600,000 datasets, more than 5 terabytes. As far as I know, it is the most extensive collection datasets accessible from one single place.
For more information, the paper is available here:
Related
So, I need to run SPARQL query over a semantic database but some of the triples are not going to be in the database but are going to be provided by webservices (and not as a SPARQL endpoint). I would want to be able to run a SELECT query that take into consideration those additional triples but without having to insert them in the database, is there a way to do that ?
This is not part of the SPARQL spec, so "no" is the general answer.
That said, Virtuoso (possibly among others) lets you include an external RDF source (a/k/a webservice) as part of the FROM (among other methods), to be dereferenced during SPARQL query processing.
Such webservice need not be a SPARQL endpoint, but best performance will result if it provides RDF (though serialization may vary).
The Virtuoso Sponger can also be invoked on the fly to derive RDF from many document formats (with an obvious performance hit). To pursue, please raise this to the OpenLink Community Forum.
I've been trying to figure out how to mount a SPARQL endpoint for a couple of days, but as much as I read I can not understand it.
Comment my intention: I have an open data server mounted on CKAN and my goal is to be able to use SPARQL queries on the data. I know I could not do it directly on the datasets themselves, and I would have to define my own OWL and convert the data I want to use from CSV format (which is the format they are currently in) to RDF triple format (to be used as linked data).
The idea was to first test with the metadata of the repositories that can be generated automatically with the extension ckanext-dcat, but is that I really do not find where to start. I've searched for information on how to install a Virtuoso server for the SPARQL, but the information I've found leaves a lot to be desired, not to say that I can find nowhere to explain how I could actually introduce my own OWLs and RDFs into Virtuoso itself.
Someone who can lend me a hand to know how to start? Thank you
I'm a little confused. Maybe this is two or more questions?
1. How to convert tabular data, like CSV, into the RDF semantic format?
This can be done with an R2RML approach. Karma is a great GUI for that purpose. Like you say, a conversion like that can really be improved with an underlying OWL ontology. But it can be done without creating a custom ontology, too.
I have elaborated on this in the answer to another question.
2. Now that I have some RDF formatted data, how can I expose it with a SPARQL endpoint?
Virtuoso is a reasonable choice. There are multiple ways to deploy it and multiple ways to load the data, and therefore LOTs of tutorial on the subject. Here's one good one, from DBpedia.
If you'd like a simpler path to starting an RDF triplestore with a SPARQL endpoint, Stardog and Blazegraph are available as JARs, and RDF4J can easily be deployed within a container like Tomcat.
All provide web-based graphical interfaces for loading data and running queries, in addition to SPARQL REST endpoints. At least Stardog also provides command-line tools for bulk loading.
I'm trying to build a query to fetch instances of / any subclasses of abstract elements such as "human" (Q5) by name, however the query fails with a timeout, probably because it has too many nodes to traverse in the graph.
Are there any better methods to query this? The best I could come up with is using the Wikidata API search entities endpoint with the element name, then filter the desired results in Sparql query to minimize the domain of the query instead of the whole graph.
I'm a little worried about using this method in a production environment since Wikidata Sparql is in Beta. Any best practices for migrating knowledge graph use cases from freebase? Is there any update regarding the migration of data from Freebase to Wikidata?
Finally are there any other mature alternatives to the deprecated Freebase service?
What endpoint are you querying against? Querying against a shared public endpoint with no SLA (beta or not) for a production service is very risky proposition.
Wikidata offers full database dumps that you can tailor/subset and load into whatever infrastructure you like. That would give you complete control over performance, quality, and any other metrics which are important to you.
As far as migrating from Freebase goes, there is no migration path. The track that train was on has come to an end (at least for external non-Google users). It's not just deprecated, it was shut down completely a while ago. A tiny fraction of the data was imported to Wikidata (and they shared a bunch in common already due to their common ancestor Wikipedia), but none of the programmatic features such as MQL's JSON query-by-example, Freebase Search, Freebase Suggest, Google-scale performance or availability, etc is available (yet?) for Wikidata.
If the data is important to you, you should self-host using whatever infrastructure meets your needs.
Can I set up multiples triplestores in Virtuoso in the same way I create multiples databases in, for example, a conventional mysql DBMS? Each database would be independent with (possibly) its own sparql endpoint.
Yes you can,
at least as far as i understood your question.
You can add additional datastes to the virtuoso triple store under a new graph, which you would use in the FROM statament of your queries to point out the named graph you want your results to stem from:
create graph <http://myNewAndShinyGraph.org/some/path>;
Now you can add/upload you dataset into the triplestore under the new context you created. (As usual via SPARQL INSERT, TTLP or ld_dir...)
You can also expose this graph with a different SPARQL endpoint.
Follow the steps described by Hugh Williams here: Defining endpoints in Virtuoso
Also of interest: How to create a SPARQL endpoint using Virtuoso?
Your question is extremely broad and difficult to answer both concisely and usefully. The short answer is "Yes," but that seems less than useful.
Virtuoso (produced by my employer, OpenLink Software) is a "conventional" SQL-style DBMS, akin to MySQL, PostgreSQL, Oracle, SQL Server, etc., though being a hybrid engine, it is also a NoSQL, graph/RDF, XML, object, and various other style DBMS. In the graph/RDF realm, it is actually a Quad Store, which allows for use as either a simple triplestore, or a collection of Named Graph whether each might be considered a separate triplestore...
One Virtuoso DB file may contain multiple SQL-style CATALOGS, as well as multiple Named Graphs and other divisions of RDF/graph data, for which you can set up distinct SPARQL endpoints -- or you can set up distinct DB files (and Virtuoso instances), each with one database/data set. There may be other options appropriate to your needs ...
Virtuoso-specific questions are often better raised in Virtuoso-specific areas, such as the public Virtuoso Users mailing list, the public OpenLink Support Forums, a confidential OpenLink Support Case, etc.
i'm new to semantic web.
I'm trying to do a sample application where i can query data from different data sources in one query.
i have created a small rdf file which contains references to dbpedia resources for defining localities. my question is : how can i get the data contained in my file and other information which is in the description of the distant resource (for example : the name of the person from the local file, and the total poulation in a city dbpedia-owl:populationTotal from the distant rdf file).
i don't really understand the sparql query language, i tried to use the JENA ARQ API with the SERVICE keyword but it doesn't solve the problem.
Any help please?
I guess you are looking for something like the Semantic Web Client Library, which tries to leverage the GGG. Albeit, the standard exploration algorithm of this framework is that it follows rdfs:seeAlso links. Nevertheless, the general approach seems to be what your are looking for, i.e., you would create a local graph that consists of your seed graph and that traverse the relations up to a certain level, e.g., three steps, resolves the URIs and load that content into your local triple. Utilising advanced technologies like SPARQL federation might be something for later ;)
I have retrived data from two different sources using SPARQL query with named graphs.
I used jena-ARQ to execute the sparql query.