Suitability of MongoDB for equivalent of XPath - sql

I am very interested in using MongoDB for a variety of reasons. It suits many of my needs well.
However, I also need to perform the equivalent of an XPath query. I have a complex hierarchical document. I need to be able to extract specific nodes (and their children) based on parameter matching. Something like:
Give me the document structure starting at node x where the attribute "level" is null or 1.
Can MongoDB do this and if so, how can I go about it? Or should I stick to PostgreSQL / SQL Server for this type of work?

Wrong tool....use a database providing explicit support for hierarchical data like a graph database or a RDBMS with support for XML (if you are using XML). MongoDB is not suited for this purpose..

Related

How to represent an SQL query graphically

I have a SQL Select query with many joins between tables, I want to know which kind of diagram could represent it graphically in order to visualise the joins between tables and their types (differentiate between INNERs and LEFTs) ?
I did this simple schema to represent my query but I'm searching for a known and better type of diagram :
I believe what you're looking for is a variation of an Entity Relationship Diagram, where the different line-ends indicate the relationship type. When structuring a database, I believe this type of model is most common and easily understood.
You can use crow's foot notation for Enitity Relationship Diagram where you can specify the relationship(one to many, one to one) as well as the optionality(enitityA has exactly one of enitityB OR enitityA has 0 or 1 of enitityB). In your case optionality might represent the type of join you need to do. Why exactly you need to specify the join types in an ERD?
I'm baffled why this isn't something that exists already, and even more baffled why more developers aren't screaming their heads off demanding this. There is an undeniable need for dynamically created visual representations of sql queries. Trying to understand a single 300 line sql procudure is difficult enough without column names obfuscating their business representation using names like "AsecRemFortsKilnNumAttr".
When there are hundreds of columns in a procedure with names like this, but subtly different, like "AsecRemAgsKilnNumAttr" and "AsecRemFortsKilnNumV" as well then the task is impossible.
Anyway, here are a couple I've found: -
https://sqldep.com/sql-parser/
http://queryviz.com/
https://sourceforge.net/projects/revj/
The last one contains a Python utility that you can use to improve and extend if you know how.
Other than that the only other tool that may allow what you're looking for is Informatica Developer 10 which allows you to create a mapping from imported sql queries.
SQL queries are usually represented in tree form, structured based on how the query is parsed (logical plan) or how the query engine will execute the query (physical plan)
See abstract syntax tree (AST) https://en.wikipedia.org/wiki/Abstract_syntax_tree as an example

sql to rdf according ot a specific class

I have a database and i want to change it to rdf
for now I just want to change the data from one table to rdf (later I will do the same with the whole database)
i have an access to that table from sql developer, it is an oracle sql table
it has many columns but mainly i am interested in
ID column and name column
so the data is like this:
ID name
1 oneNameSomething
2 anotherDicName
but the table has 34000 row
is it possible to do sometihng like:
prefix:idValueFromTable a SpecificClass .
prefix:idValueFromTable prefix:hasName prefix:NameFromTable
i know i can build a tool to do that, but giving a quick search on internet https://www.w3.org/wiki/ConverterToRdf#SQL sound like there are already tools, but i don't know which one works in my case, i thought i ask you first
This page doesn't have the tool
https://www.w3.org/TR/r2rml/#overview
For a "one-off" translation, you could just execute an SQL query, and then get the resulting rows, and write output as Turtle or N-Triples. That would probably be the quickest way to get some RDF.
If you need a more principled approach, and for more than just one table (and with cross referencing between tables), I'd look into some of the tools for mapping relational databases to RDF, such as D2RQ. The mapping schemes are relatively flexible, you can get RDF back without a whole lot of setup.
To get the data back out, you could use a SPARQL construct query, if you want to precisely control what you get, or (as I just learned from your comment), you can use the dump-rdf tool to get an RDF dump of the database.

What's the common practice in constituting the WHERE clause based on the user input

If we take a database table, we can query all the rows or we can choose to apply a filter on it. The filter can vary depending on the user input. In cases when there are few options we can specify different queries for those few specific conditions. But if there are lots and lots of options that user might or might not specify, aforementioned method does not come handy. I know, I can compose the filter based upon the user input and send it as a string to the corresponding stored procedure as a parameter, build the query with that filter and finally execute the query string with the help of EXECUTE IMMEDIATE(In Oracle's case). Don't know why but I really don't like this way of query building. I think this way I leave the doors open for SQL injectors. And besides, that I always have trouble with the query itself as everything is just a string and I need to handle dates and numbers carefully.What is the best and most used method of forming the WHERE clause of a query against a database table?
Using database parameters instead of attempting to quote your literals is the way forward.
This will guard you against SQL injection.
A common way of approaching this problem is building expression trees that represent your query criteria, converting them to parameterized SQL (to avoid SQL injection risks), binding parameter values to the generated SQL, and executing the resultant query against your target database.
The exact approach depends on your client programming framework: .NET has Entity Framework and LINQ2SQL that both support expression trees; Java has Hibernate and JPA, and so on. I have seen several different frameworks used to construct customizable queries, with great deal of success. In situations when these frameworks are not available, you can roll your own, although it requires a lot more work.

Would keeping an XML data inside sql table be an architectural misconception?

I've got an SQL table that I use to keep product data. Some products have other attached data to them (be it: books have number of pages, cover type; movies have their time-length; etc).
I could use a separate table in SQL to keep those, keeping (name, value) pairs.
I can also just keep an XML-packed data in a single field in a table. It's not a normalized approach, but seems more-natural for me.
I did a similar thing in a shopping basket application. We needed to attach meta data to the products without creating too much of a schema, which would have restricted the format of the meta-data in the future. We kept the meta-data as XML.
The only reason I would not do it is if you're going to end up performing queries on the data. Just make sure you won't have some daft person wanting reports by Publisher meta-data or something (which has happened to me) and you should be fine.
If you were intending to use XML as a way of not properly defining database tables that would indeed an architectural cop-out. I'm not sure about your scenario, it seems dangerously close to that. But key-value pairs are probably worse.
The best thing is to use a specialist XML datatype, if your database has one. In addition to RageZ's list, Oracle as had an XMLType for ten years now (since 9i). The advantage of using XMLType is two-fold. It announces to the casual observer that the documents in this column are XML. It also gives you access to built-in functionality, such as validation with XML Schemas, should you want it. Other features could prove handy if you subsequently have to start referring to the contents of the XML. For instance, Oracle's XDB supports an XML index type which can dramatically improve the performance of XPath queries.
It depends!
If you expect the 'shape' of your products to vary greatly then XML is a good way to go. [If you are using SQL Server you can index an XML field.]
I don't htink it's an architectural misconception. Just make sure you don't want to use those data in a query because it's gonna be complex.
Plus recent RDBM have function to handle XML (MSSQL, Postgres, Mysql) so you would still be able to use those data.

How to create dynamic and safe queries

A "static" query is one that remains the same at all times. For example, the "Tags" button on Stackoverflow, or the "7 days" button on Digg. In short, they always map to a specific database query, so you can create them at design time.
But I am trying to figure out how to do "dynamic" queries where the user basically dictates how the database query will be created at runtime. For example, on Stackoverflow, you can combine tags and filter the posts in ways you choose. That's a dynamic query albeit a very simple one since what you can combine is within the world of tags. A more complicated example is if you could combine tags and users.
First of all, when you have a dynamic query, it sounds like you can no longer use the substitution api to avoid sql injection since the query elements will depend on what the user decided to include in the query. I can't see how else to build this query other than using string append.
Secondly, the query could potentially span multiple tables. For example, if SO allows users to filter based on Users and Tags, and these probably live in two different tables, building the query gets a bit more complicated than just appending columns and WHERE clauses.
How do I go about implementing something like this?
The first rule is that users are allowed to specify values in SQL expressions, but not SQL syntax. All query syntax should be literally specified by your code, not user input. The values that the user specifies can be provided to the SQL as query parameters. This is the most effective way to limit the risk of SQL injection.
Many applications need to "build" SQL queries through code, because as you point out, some expressions, table joins, order by criteria, and so on depend on the user's choices. When you build a SQL query piece by piece, it's sometimes difficult to ensure that the result is valid SQL syntax.
I worked on a PHP class called Zend_Db_Select that provides an API to help with this. If you like PHP, you could look at that code for ideas. It doesn't handle any query imaginable, but it does a lot.
Some other PHP database frameworks have similar solutions.
Though not a general solution, here are some steps that you can take to mitigate the dynamic yet safe query issue.
Criteria in which a column value belongs in a set of values whose cardinality is arbitrary does not need to be dynamic. Consider using either the instr function or the use of a special filtering table in which you join against. This approach can be easily extended to multiple columns as long as the number of columns is known. Filtering on users and tags could easily be handled with this approach.
When the number of columns in the filtering criteria is arbitrary yet small, consider using different static queries for each possibility.
Only when the number of columns in the filtering criteria is arbitrary and potentially large should you consider using dynamic queries. In which case...
To be safe from SQL injection, either build or obtain a library that defends against that attack. Though more difficult, this is not an impossible task. This is mostly about escaping SQL string delimiters in the values to filter for.
To be safe from expensive queries, consider using views that are specially crafted for this purpose and some up front logic to limit how those views will get invoked. This is the most challenging in terms of developer time and effort.
If you were using python to access your database, I would suggest you use the Django model system. There are many similar apis both for python and for other languages (notably in ruby on rails). I am saving so much time by avoiding the need to talk directly to the database with SQL.
From the example link:
#Model definition
class Blog(models.Model):
name = models.CharField(max_length=100)
tagline = models.TextField()
def __unicode__(self):
return self.name
Model usage (this is effectively an insert statement)
from mysite.blog.models import Blog
b = Blog(name='Beatles Blog', tagline='All the latest Beatles news.')
b.save()
The queries get much more complex - you pass around a query object and you can add filters / sort elements to it. When you finally are ready to use the query, Django creates an SQL statment that reflects all the ways you adjusted the query object. I think that it is very cute.
Other advantages of this abstraction
Your models can be created as database tables with foreign keys and constraints by Django
Many databases are supported (Postgresql, Mysql, sql lite, etc)
DJango analyses your templates and creates an automatic admin site out of them.
Well the options have to map to something.
A SQL query string CONCAT isn't a problem if you still use parameters for the options.