Grails & GORM: How to specify CREATE INDEX equivalent on a domain-class? - sql

May I share my frustration. GORM (and Grails..) seems to have a VERY limited documentation regarding database indices. I haven't found any help from anywhere on how to create an index for a domain class when I want the index to be something more than just what has been documented here: http://grails.org/doc/latest/guide/GORM.html.
Here's in SQL what I would like to achieve the Grails' way:
CREATE INDEX very_fast_index
ON slow_table(date DESC NULLS LAST)
WHERE is_latest = true;
Seems like I could tell GORM to create an index for the date column. But looks like there's ZERO options for adding the other criteria.
As I hate when simple things are made extremely complicated, I've created these indices manually in the PostgreSQL CLI. Not from Grails, which would be more portable. I do not want to write any HQL, as I don't like that idea either.
Any ideas? I've got none other than the manual way.

HQL is a Data manipulation language non a Data definition language so is not useful for your needs. If you want to use database vendor specific features you have to bypass hibernate and use the lower level jdbc connection to do SQL querys. In grails you can use dataSource bean to make query the groovy way. Of course this tight you to a specific database (in your case Postgres).

Related

How should we name or map SQL tables from Postgres to Node?

We're switching over for new projects from .NET to Node and PostgreSQL.
Since Postgres best practices seem to be not relying on capitalization, we're naming DB columns with_underscores_to_separate_words instead of UsingPascalCase as we were with MSSQL and Linq.
Would best practice be:
To map all the columns to camelCase in the queries? (Tedious - that's what we're doing now, with multiple lines like "member_id as memberID" or "obj.memberID = dbObj.member_id".)
To automatically map somehow camelCase variables in the code with underline-separated SQL columns?
To just give in to Postgres naming and have my objects returned from DB queries have underscore separation in my code? (Seems undesirable - then we have non-DB objects with camel case and DB objects with underscore separation... messy.)
Would really like to use SQL queries instead of an ORM, but so far this is a sticking point.
We decided to go with Knex which automatically quotes all column names, so there's no casing issue. We're using camelCase and PascalCase for naming in Postgres so the DB is consistent with the code.
The disadvantage is that when running raw queries against Postgres, we need to quote the column names, which we can live with.
Edit: We're now using Objection's knexSnakeCaseMappers which handles this automatically in Postgres - camelCase in the code, snake_case in the DB. Very convenient.
Which one would you prefer to support? Mapping columns in the queries as in #1 is a lot of work now and in the future; automating it by passing source and result objects through humps or the like subtracts much of that constant effort, but it's another step and another place things could go wrong. The only strike against #3 is that it's a bit ugly. You can live with ugly -- turn off any camelCase lint rules and it'll barely register after a while.
You do have some options if you're dead set on casing; I know Sequelize supports switching between camelCase and snake_case if you configure your models appropriately, and for a much lower-level take there's an old driver plugin. I would, however, recommend just getting used to it.
I believe this covers all your points: pg-promise and case sensitivity in column names.
It is the easiest solution - to use underscore syntax for all your column names, and then automatically convert those into camel-case, as shown in the example.
Would really like to use SQL queries instead of an ORM, but so far this is a sticking point.
And this is exactly what you get with pg-promise. And even better, you can nicely organize all your SQL within external SQL files, see Query Files and pg-promise-demo.
See also - event receive.

Spring Data Neo4j search and index

I´m using:
neo4j 2.0.1
spring data neo4j 3.0.1.RELEASE
I have a Node Person with the property name and I would like to search with Lucene syntax on that property. I´m using the findByNameLike method in my repository and that works perfect for query like value* or *value or * etc.
But I need a query like that {A* TO D*}. I found a deprecated method findAllByQuery("name", query) and with that method I can achieve my needs.
I was wondering what is the new method not deprecated that understand such query syntax.
I also noticed that if I create node from cypher, the node are not available in my search.
With SDN I think that the generated node will be automatically add also to the index, but I don´t know how to check that and what is the index name. I must generate the node from cypher to have some base data in all my system. Should I add some special property in my cypher query?
First of all, make sure you understand the differences between the legacy (deprecated) Lucene indexes and the newer Schema indexes.
I was wondering what is the new method not deprecated that understand
such query syntax.
You'll have to use one of the methods of the SchemaIndexRepositry which is extended in the GraphRepository interface for convenience. Bear in mind that e.g. wildcard searches are not yet implemented on schema indexes. If you want to use wildcard searches, you'll have 2 options. Continue to use the Lucene indexes (your best option for now), or use a regex query in a custom repository method. E.g.
MATCH (p:Person) WHERE p.name =~ ".*test.*" RETURN p
I also noticed that if I create node from cypher, the node are not
available in my search. With SDN I think that the generated node will
be automatically add also to the index, but I don´t know how to check
that and what is the index name. I must generate the node from cypher
to have some base data in all my system. Should I add some special
property in my cypher query?
If you're using Lucene indexes, new entries will not be added to the index. AFAIK, you can only to that programatically. Schema indexes can be created as follows:
CREATE INDEX ON :Person(name)
New entries with the name property will be automatically added to the index. Again, wildcard searches will not yet use these indexes.

Portable SQL : unique primary keys

Trying to develop something which should be portable between the bigger RDBMS'es.
The issue is around generating and using auto-increment numbers as the primary key for a table.
There are two topics here
The mechanism used to generate the auto-increment numbers.
How to specify that you want to use this as the primary key on a
table.
I'm looking for verification for what I think is the current state of affairs:
Unfortunately standardization came late to this area and in some respect is still not implemented (as a mandatory standard). This means it is still in 2013 impossible to write a CREATE TABLE statement in a portable way ... if you want it with a auto-generated primary key.
Can this really be so?
Re (1). This is standardized because it came in SQL:2003. As far as I understand the way to go is SEQUENCEs. I believe these are a mandatory part of SQL:2003, right? The other possibility is the IDENTITY keyword which is also defined in SQL:2003 but that one is - as far as I can tell - an optional part of the standard ... which means a key player like Oracle doesn't implement it... and can still claim compliance. Ok, so SEQUENCEs is the designated portable method for this, right ?
Re (2). Database vendors implement this in different ways. In PostgreSQL you can link the CREATE TABLE statement directly with the sequence, in Oracle you would have to create a trigger to ensure the SEQUENCE is used with the table.
So my conclusion is that without a standardized solution to (2) it really doesn't help much that all the major players now support SEQUENCEs. I would still have to write db-specific code for something as simple as a CREATE TABLE statement.
Is this right?
Standards and their implementation aside I would also be interested if anyone has a portable solution to the problem, no matter if it is a hack from a RDBMS best practice perspective. For such a solution to work it would have to be independent from any application, i.e. it must the database that solves the issue, not the application layer. Perhaps if both the concept of TRIGGERs and SEQUENCEs can be said to be standardized then a solution that combines the two of them would be portable?
As for "portable create table statements": It starts with the data types: Whether boolean, int or long data types are part of any SQL standard or not, I really appreciate these types. PostgreSql supports these data types, Oracle does not. Ironically Oracle supports boolean in PL/SQL, but not as a data type in a table. Even the length of table/column names etc. are restricted in Oracle to 30 characters. So not even the most simple "create table" is always portable.
As for auto-generated primary keys: I am not aware of a syntax which is portable, so I do not define this in the "create table". Of couse this only delays the problem, and leaves it to the insert statements. This topic is connected with another problem: Getting the generated key after an insert using JDBC in the most efficient way. This differs substantially between Oracle and PostgreSql, and if you have ever dared to use case sensitive table/column names in Oracle, it won't be funny.
As for constraints, I prefer to add them in separate statements after "create table". The set of constraints may differ, if you implement a boolean data type in Oracle using char(1) together with a check constraint whereas PostgreSql supports this data type directly.
As for "standards": One example
SQL99 standard: for SELECT DISTINCT, ORDER BY expressions must appear in select list
This message is from PostgreSql, Oracle 11g does not complain. After 14 years, will they change it?
Generally speaking, you still have to write database specific code.
As for your conclusion: In our scenario we implemented a portable database application using a model driven approach. This logical meta data is used by the application, and there are different back ends for different database types. We do not use any ORM, just "direct SQL", because this simplifies tuning of SQL statements, and it gives full access to all SQL features. We wrote our own library, and later we found out that the key ideas match these of "Anorm".
The good news is that while there are tons of small annoyances, it works pretty well, even with complex queries. For example, window aggregate functions are quite portable (row_number(), partition by). You have to use listagg on Oracle, whereas you need string_agg on PostgreSql. Recursive commen table expressions require "with recursive" in PostgreSql, Oracle does not like it. PostgreSql supports "limit" and "offset" in queries, you need to wrap this in Oracle. It drives you crazy, if you use SQL arrays both in Oracle and PostgreSql (arrays as columns in tables). There are materialized views on Oracle, but they do not exist in PostgreSql. Surprisingly enough, it is possible to write database stored procedures not only in Java, but in Scala, and this works amazingly well in both Oracle and PostgreSql. This list is not complete. But so far we managed to find an acceptable (= fast) solution for any "portability problem".
Does it pay off? In our scenario, there is a central Oracle installation (RAC, read/write), but there are distributed PostgreSql installations as localhost databases on each application server (only readonly). This gives a big performance and scalability boost, without the cost penalty.
If you really want to have it solved in the database only, there is one possibility: Put anything in stored procedures, write these in Java/Scala, and restrict yourself in the application to call these procedures, and to read the result sets. This of course just moves the complexity from the application layer into the database, but you accepted hacks :-)
Triggers are quite standardized, if you use Java stored procedures. And if it is supported by your databases, by your management, your data center people, and your colleagues. The non-technical/social aspects are to be considered as well. I have even heard of database tuning people which do not accept the general "left outer join" syntax; they insisted on the Oracle way of using "(+)".
So even if triggers (PL/SQL) and sequences were standardized, there would be so many other things to consider.
Update
As for returning the generated primary keys I can only judge the situation from JDBC's perspective.
PostgreSql returns it, if you use Statement.getGeneratedKeys (I consider this the normal way).
Oracle requires you to specify the (primary key) column(s) whose values you want to get back explicitly when you create the prepared statement. This works, but only if you are not using case sensitive table names. In that case all you receive is a misleading ORA-00942: table or view does not exist thrown in Oracle's JDBC driver: There was/is a bug in Oracle's JDBC driver, and I have not found a way to get the value using a portable JDBC method. So at the cost of an additional proprietary "select sequence.currVal from dual" within the same transaction right after the insert, you can get back the primary key. The additional time was acceptable in our case, we compared the times to insert 100000 rows: PostgreSql is faster until the 10000th row, after that Oracle performs better.
See a stackoverflow question regarding the ways to get the primary key and
the bug report with case sensitive table names from 2008
This example shows pretty well the problems. Normally PostgreSql follows the way you expect it to work, but you may have to find a special way for Oracle.

What's the common practice in constituting the WHERE clause based on the user input

If we take a database table, we can query all the rows or we can choose to apply a filter on it. The filter can vary depending on the user input. In cases when there are few options we can specify different queries for those few specific conditions. But if there are lots and lots of options that user might or might not specify, aforementioned method does not come handy. I know, I can compose the filter based upon the user input and send it as a string to the corresponding stored procedure as a parameter, build the query with that filter and finally execute the query string with the help of EXECUTE IMMEDIATE(In Oracle's case). Don't know why but I really don't like this way of query building. I think this way I leave the doors open for SQL injectors. And besides, that I always have trouble with the query itself as everything is just a string and I need to handle dates and numbers carefully.What is the best and most used method of forming the WHERE clause of a query against a database table?
Using database parameters instead of attempting to quote your literals is the way forward.
This will guard you against SQL injection.
A common way of approaching this problem is building expression trees that represent your query criteria, converting them to parameterized SQL (to avoid SQL injection risks), binding parameter values to the generated SQL, and executing the resultant query against your target database.
The exact approach depends on your client programming framework: .NET has Entity Framework and LINQ2SQL that both support expression trees; Java has Hibernate and JPA, and so on. I have seen several different frameworks used to construct customizable queries, with great deal of success. In situations when these frameworks are not available, you can roll your own, although it requires a lot more work.

How to create dynamic and safe queries

A "static" query is one that remains the same at all times. For example, the "Tags" button on Stackoverflow, or the "7 days" button on Digg. In short, they always map to a specific database query, so you can create them at design time.
But I am trying to figure out how to do "dynamic" queries where the user basically dictates how the database query will be created at runtime. For example, on Stackoverflow, you can combine tags and filter the posts in ways you choose. That's a dynamic query albeit a very simple one since what you can combine is within the world of tags. A more complicated example is if you could combine tags and users.
First of all, when you have a dynamic query, it sounds like you can no longer use the substitution api to avoid sql injection since the query elements will depend on what the user decided to include in the query. I can't see how else to build this query other than using string append.
Secondly, the query could potentially span multiple tables. For example, if SO allows users to filter based on Users and Tags, and these probably live in two different tables, building the query gets a bit more complicated than just appending columns and WHERE clauses.
How do I go about implementing something like this?
The first rule is that users are allowed to specify values in SQL expressions, but not SQL syntax. All query syntax should be literally specified by your code, not user input. The values that the user specifies can be provided to the SQL as query parameters. This is the most effective way to limit the risk of SQL injection.
Many applications need to "build" SQL queries through code, because as you point out, some expressions, table joins, order by criteria, and so on depend on the user's choices. When you build a SQL query piece by piece, it's sometimes difficult to ensure that the result is valid SQL syntax.
I worked on a PHP class called Zend_Db_Select that provides an API to help with this. If you like PHP, you could look at that code for ideas. It doesn't handle any query imaginable, but it does a lot.
Some other PHP database frameworks have similar solutions.
Though not a general solution, here are some steps that you can take to mitigate the dynamic yet safe query issue.
Criteria in which a column value belongs in a set of values whose cardinality is arbitrary does not need to be dynamic. Consider using either the instr function or the use of a special filtering table in which you join against. This approach can be easily extended to multiple columns as long as the number of columns is known. Filtering on users and tags could easily be handled with this approach.
When the number of columns in the filtering criteria is arbitrary yet small, consider using different static queries for each possibility.
Only when the number of columns in the filtering criteria is arbitrary and potentially large should you consider using dynamic queries. In which case...
To be safe from SQL injection, either build or obtain a library that defends against that attack. Though more difficult, this is not an impossible task. This is mostly about escaping SQL string delimiters in the values to filter for.
To be safe from expensive queries, consider using views that are specially crafted for this purpose and some up front logic to limit how those views will get invoked. This is the most challenging in terms of developer time and effort.
If you were using python to access your database, I would suggest you use the Django model system. There are many similar apis both for python and for other languages (notably in ruby on rails). I am saving so much time by avoiding the need to talk directly to the database with SQL.
From the example link:
#Model definition
class Blog(models.Model):
name = models.CharField(max_length=100)
tagline = models.TextField()
def __unicode__(self):
return self.name
Model usage (this is effectively an insert statement)
from mysite.blog.models import Blog
b = Blog(name='Beatles Blog', tagline='All the latest Beatles news.')
b.save()
The queries get much more complex - you pass around a query object and you can add filters / sort elements to it. When you finally are ready to use the query, Django creates an SQL statment that reflects all the ways you adjusted the query object. I think that it is very cute.
Other advantages of this abstraction
Your models can be created as database tables with foreign keys and constraints by Django
Many databases are supported (Postgresql, Mysql, sql lite, etc)
DJango analyses your templates and creates an automatic admin site out of them.
Well the options have to map to something.
A SQL query string CONCAT isn't a problem if you still use parameters for the options.