SPARQL CONSTRUCT expressivity

SPARQL CONSTRUCT expressivity - sparql

Are there any metrics or analysis on how expressive SPARQL CONSTRUCT queries are? Are there graphs or transformations that can't be expressed via CONSTRUCT? What are the limitations?

SPARQL is pspace-complete, like SQL. It doesn't matter which form you're using.
I'd say the primary limitation of construct queries is that they cannot construct quads.

An arbitrary variable length list is not possible in a single CONSTRUCT. The template can't be written because the CONSTRUCT template is a fixed pattern.

Related

Graph URIs on GraphDB

The SPARQL graph concept is still abstract for me.
Are graphs similar to SQL Tables? Does one triple could belong to more than one graph?
Is there a way to get the URI of the graph to which belongs a triple?
Is there a way to get the URI of every graph stored in a GraphDB repository? To get the URI of the default graph?
Thanks,

In addition of the answers of UninformedUser, graphs also could be listed using the following cURL command curl 'http://<host>:<port>/repositories/<your_repo>/contexts'.
One could easily explore and export the content of graphs in GraphDB repository using Workbench.
Here could be found the value of the default graph as well.

To understand the differences, we could refer first to the data model itself:
Relational model
SQL operates on relational databases. Relational model, in general terms, is a way to model your data and its dependencies as structured tables. If we want to query information that is built out of more than one table, then we perform what is called the join operation that retrieves the relationships between elements in different tables.
The join operator often uses keys that are common among the tables.
One drawback of this model is when you need to perform many of such join operations to retrieve the desired information. In this case, the response becomes very slow and the performance drops significantly.
Graph model
In contrast, SparQL is a query language that has a SQL-like syntax but operates on graph data. The graph data model does not store the data as tables. Instead, it uses the triple store (another option is to use a Property Graph, but your question refers to triple store).
RDF triples are the W3C standard way for describing statements in the graph data model. It consist of Subject, Predicate, Object.
Subject is the entity to which the triple refers, Predicate is the type of relationship it has, and Object is the entity or property to which is the subject connects.
The subject has always a unique identifier (e.g, URI), whereas the object can be a literal or another entity with an URI.
Further comparison
You can have a look at a more detailed comparison of the models in chapter 2 (Options for Storing Connected Data) of the book: Graph Databases, 2nd Edition
by Ian Robinson, Jim Webber, Emil Eifrem
Coming back to your questions
Are graphs similar to SQL Tables? Does one triple could belong to more
than one graph?
No, graphs are not SQL tables.
And yes, one triple could appear in different graphs (as long as the triple has the same Subject, Predicate, and Object). However, if you use different ontologies, the triple would appear in a different context.
Is there a way to get the URI of the graph to which belongs a triple? Is there a way to get the URI of every graph stored in a GraphDB repository? To get the URI of the default graph?
The SparQL queries are graph patterns. You need to specify the type of structure you are searching for.
As UninformedUser commented:
All triples
select distinct ?g {graph ?g {?s ?p ?o}}
All structures that contain an specific triple
select distinct ?g {graph ?g {:s :p :o}}

MarkLogic: Constrain SPARQL query scope by triple-range-query constraint

I would like to evaluate a SPARQL query against a limited document scope, which is based on a triple range query. Only embedded triples contained by documents which match a specific triple pattern should be part of the SPARQL evaluation scope. I'm using the Java SDK (via marklogic-rdf4j) to evaluate the SPARQL query. We're only using embedded/unmanaged triples.
I'm aware of the possibility to attach a structured query definition to a SPARQL query (by calling MarkLogicQuery::setConstrainingQueryDefinition), but the structured query syntax does not support triple-range-query constraints.
Is there any way to apply one or more triple-range-query constraints in a structured query definition? Or are there better alternatives?

Support for triple-range-query in structured queries has been requested before. I added your case to the ticket.
In the mean time you might get away with using a custom constraint. Me and a colleague put this together:
https://github.com/patrickmcelwee/triple-range-constraint/blob/master/triple-range-constraint.xqy
HTH!

Sparql VS XQuery (MarkLogic)

After playing with MarkLogic I realized results from triples can be obtained in several ways for example by fully using either Xquery or SPARQL. So the question is that, are there any advantages using SPARQL over XQuery? Is there some indexing going on which makes SPARQL much faster then searching for a certain semantic query?
For instance if we are retrieving all semantic documents with the predicate "/like".
SPARQL
SELECT *
WHERE {
?s </like> ?o
}
XQuery
cts:search(fn:doc(), cts:element-query(xs:QName("sem:predicate"), "/like"))
Therefore, is there any difference in efficiency between these two?

Yes, there are definitely differences. Whether XQuery or SPARQL is most efficient however fully depends on the problem you are trying to solve. XQuery is best at querying and processing document data, while SPARQL really allows you to reason easily over RDF data.
It is true that RDF data is serialized as XML in MarkLogic, and you can full-text search it, and even put range indexes on it if you like, but RDF data is already indexed in the triple index, which would give you more accurate results than the full-text search of above.
Also note that SPARQL allows you to follow predicate paths, which involves a lot of joining. That will be much more efficient if done via SPARQL than via XQuery, because it is mostly resolved via the triple index. Image a SPARQL query like this one:
PREFIX pers: <http://my.persons/>;
PREFIX topic: <http://my.topics/>;
PREFIX pred: <http://my.predicates/>;
SELECT DISTINCT *
WHERE {
?person pred:likes topic:Chocolate;
pred:friendOf+ ?friend.
FILTER( ?friend = (pres:WhiteSolstice) )
FILTER( ?friend != ?person )
}
It tries to find all direct and indirect friends that like chocolate. I wouldn't write something like that in XQuery.
Then again, there are other things that are easy in XQuery, and practically impossible in SPARQL. And sometimes most efficient is to combine the two, doing a sem:sparql from inside XQuery, and using the results to direct further processing in XQuery. It also sometimes comes down to what shape your data is in..
HTH!

A little nuance here: search is about searching for documents. Unless you have one triple per document, fetching just the triples that match out of a bunch in a document will involving pulling the whole document from disk (although it may be in cache). SPARQL is about selecting triple data from the triple indexes, which may involve less disk IO. Certainly if you are doing anything other than a simple fetch of a simple triple pattern, you're going to need the understanding of relationships that SPARQL gives you.

Whats the motivation behind implementing BigQuery UDFs as map in mapreduce?

Google BigQuery now support UDFs that works like mappers in mapreduce.
BigQuery supports user-defined functions (UDFs) written in JavaScript. A UDF is similar to the "Map" function in a MapReduce: it takes a single row as input and produces zero or more rows as output. The output can potentially have a different schema than the input.
From https://cloud.google.com/bigquery/user-defined-functions
What's the motivation behind implementing UDFs on rows over allowing UDFs which works as pure functions on columns/fields, like how UDFs work in hive https://cwiki.apache.org/confluence/display/Hive/HivePlugins.
I guess you can express any UDF that works on column (like hive UDF) as an UDFs that works on rows (BigQuery UDF) but not vise versa. That would be possible by defining a UDF (in BigQuery) with the same input and output schema as the dataset and all values just passed through but the field that you want to apply your function to.
This is of course cumbersome if you want to apply the same function to different datasets with different schemas. Please help me understand.

Current implementation of UDFs in BigQuery is just the first step. As you note - it is most generic way if you want to be able to deal with nested and repeated structures, but it makes it cumbersome when you want just simple scalar values. Expect future improvements in this area where simple UDFs will be simple.

Suitability of MongoDB for equivalent of XPath

I am very interested in using MongoDB for a variety of reasons. It suits many of my needs well.
However, I also need to perform the equivalent of an XPath query. I have a complex hierarchical document. I need to be able to extract specific nodes (and their children) based on parameter matching. Something like:
Give me the document structure starting at node x where the attribute "level" is null or 1.
Can MongoDB do this and if so, how can I go about it? Or should I stick to PostgreSQL / SQL Server for this type of work?

Wrong tool....use a database providing explicit support for hierarchical data like a graph database or a RDBMS with support for XML (if you are using XML). MongoDB is not suited for this purpose..

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas