MarkLogic: Constrain SPARQL query scope by triple-range-query constraint - sparql

I would like to evaluate a SPARQL query against a limited document scope, which is based on a triple range query. Only embedded triples contained by documents which match a specific triple pattern should be part of the SPARQL evaluation scope. I'm using the Java SDK (via marklogic-rdf4j) to evaluate the SPARQL query. We're only using embedded/unmanaged triples.
I'm aware of the possibility to attach a structured query definition to a SPARQL query (by calling MarkLogicQuery::setConstrainingQueryDefinition), but the structured query syntax does not support triple-range-query constraints.
Is there any way to apply one or more triple-range-query constraints in a structured query definition? Or are there better alternatives?

Support for triple-range-query in structured queries has been requested before. I added your case to the ticket.
In the mean time you might get away with using a custom constraint. Me and a colleague put this together:
https://github.com/patrickmcelwee/triple-range-constraint/blob/master/triple-range-constraint.xqy
HTH!

Related

SPARQL CONSTRUCT expressivity

Are there any metrics or analysis on how expressive SPARQL CONSTRUCT queries are? Are there graphs or transformations that can't be expressed via CONSTRUCT? What are the limitations?
SPARQL is pspace-complete, like SQL. It doesn't matter which form you're using.
I'd say the primary limitation of construct queries is that they cannot construct quads.
An arbitrary variable length list is not possible in a single CONSTRUCT. The template can't be written because the CONSTRUCT template is a fixed pattern.

What's the common practice in constituting the WHERE clause based on the user input

If we take a database table, we can query all the rows or we can choose to apply a filter on it. The filter can vary depending on the user input. In cases when there are few options we can specify different queries for those few specific conditions. But if there are lots and lots of options that user might or might not specify, aforementioned method does not come handy. I know, I can compose the filter based upon the user input and send it as a string to the corresponding stored procedure as a parameter, build the query with that filter and finally execute the query string with the help of EXECUTE IMMEDIATE(In Oracle's case). Don't know why but I really don't like this way of query building. I think this way I leave the doors open for SQL injectors. And besides, that I always have trouble with the query itself as everything is just a string and I need to handle dates and numbers carefully.What is the best and most used method of forming the WHERE clause of a query against a database table?
Using database parameters instead of attempting to quote your literals is the way forward.
This will guard you against SQL injection.
A common way of approaching this problem is building expression trees that represent your query criteria, converting them to parameterized SQL (to avoid SQL injection risks), binding parameter values to the generated SQL, and executing the resultant query against your target database.
The exact approach depends on your client programming framework: .NET has Entity Framework and LINQ2SQL that both support expression trees; Java has Hibernate and JPA, and so on. I have seen several different frameworks used to construct customizable queries, with great deal of success. In situations when these frameworks are not available, you can roll your own, although it requires a lot more work.

Solr/Lucene: What is the difference between regular queries and filter queries

I'm currently implementing a Solr solution where a user is able to select various options to search for a product. I can now take all those options and put them together into one single long query, or I can use a query that fetches everything (*:*) and applies query filters to it.
Regular query:
q=color:blue AND price:500
Query using filter queries:
q=*:*&fq=color:blue&fq=price:500
The result is exactly the same. So what is the difference? When should I use one or the other?
Filter queries do not influence scores of the document.
Further they are useful in Caching, the queries specified with fq are cached independently from the main query
Document for solr query parameters
Typically in any production system you would use a variant of the Dismax request handler which doesn't support the former syntax, hence filtering must be performed using filter queries in that case.

Suitability of MongoDB for equivalent of XPath

I am very interested in using MongoDB for a variety of reasons. It suits many of my needs well.
However, I also need to perform the equivalent of an XPath query. I have a complex hierarchical document. I need to be able to extract specific nodes (and their children) based on parameter matching. Something like:
Give me the document structure starting at node x where the attribute "level" is null or 1.
Can MongoDB do this and if so, how can I go about it? Or should I stick to PostgreSQL / SQL Server for this type of work?
Wrong tool....use a database providing explicit support for hierarchical data like a graph database or a RDBMS with support for XML (if you are using XML). MongoDB is not suited for this purpose..

How to create dynamic and safe queries

A "static" query is one that remains the same at all times. For example, the "Tags" button on Stackoverflow, or the "7 days" button on Digg. In short, they always map to a specific database query, so you can create them at design time.
But I am trying to figure out how to do "dynamic" queries where the user basically dictates how the database query will be created at runtime. For example, on Stackoverflow, you can combine tags and filter the posts in ways you choose. That's a dynamic query albeit a very simple one since what you can combine is within the world of tags. A more complicated example is if you could combine tags and users.
First of all, when you have a dynamic query, it sounds like you can no longer use the substitution api to avoid sql injection since the query elements will depend on what the user decided to include in the query. I can't see how else to build this query other than using string append.
Secondly, the query could potentially span multiple tables. For example, if SO allows users to filter based on Users and Tags, and these probably live in two different tables, building the query gets a bit more complicated than just appending columns and WHERE clauses.
How do I go about implementing something like this?
The first rule is that users are allowed to specify values in SQL expressions, but not SQL syntax. All query syntax should be literally specified by your code, not user input. The values that the user specifies can be provided to the SQL as query parameters. This is the most effective way to limit the risk of SQL injection.
Many applications need to "build" SQL queries through code, because as you point out, some expressions, table joins, order by criteria, and so on depend on the user's choices. When you build a SQL query piece by piece, it's sometimes difficult to ensure that the result is valid SQL syntax.
I worked on a PHP class called Zend_Db_Select that provides an API to help with this. If you like PHP, you could look at that code for ideas. It doesn't handle any query imaginable, but it does a lot.
Some other PHP database frameworks have similar solutions.
Though not a general solution, here are some steps that you can take to mitigate the dynamic yet safe query issue.
Criteria in which a column value belongs in a set of values whose cardinality is arbitrary does not need to be dynamic. Consider using either the instr function or the use of a special filtering table in which you join against. This approach can be easily extended to multiple columns as long as the number of columns is known. Filtering on users and tags could easily be handled with this approach.
When the number of columns in the filtering criteria is arbitrary yet small, consider using different static queries for each possibility.
Only when the number of columns in the filtering criteria is arbitrary and potentially large should you consider using dynamic queries. In which case...
To be safe from SQL injection, either build or obtain a library that defends against that attack. Though more difficult, this is not an impossible task. This is mostly about escaping SQL string delimiters in the values to filter for.
To be safe from expensive queries, consider using views that are specially crafted for this purpose and some up front logic to limit how those views will get invoked. This is the most challenging in terms of developer time and effort.
If you were using python to access your database, I would suggest you use the Django model system. There are many similar apis both for python and for other languages (notably in ruby on rails). I am saving so much time by avoiding the need to talk directly to the database with SQL.
From the example link:
#Model definition
class Blog(models.Model):
name = models.CharField(max_length=100)
tagline = models.TextField()
def __unicode__(self):
return self.name
Model usage (this is effectively an insert statement)
from mysite.blog.models import Blog
b = Blog(name='Beatles Blog', tagline='All the latest Beatles news.')
b.save()
The queries get much more complex - you pass around a query object and you can add filters / sort elements to it. When you finally are ready to use the query, Django creates an SQL statment that reflects all the ways you adjusted the query object. I think that it is very cute.
Other advantages of this abstraction
Your models can be created as database tables with foreign keys and constraints by Django
Many databases are supported (Postgresql, Mysql, sql lite, etc)
DJango analyses your templates and creates an automatic admin site out of them.
Well the options have to map to something.
A SQL query string CONCAT isn't a problem if you still use parameters for the options.