What's the common practice in constituting the WHERE clause based on the user input - sql

If we take a database table, we can query all the rows or we can choose to apply a filter on it. The filter can vary depending on the user input. In cases when there are few options we can specify different queries for those few specific conditions. But if there are lots and lots of options that user might or might not specify, aforementioned method does not come handy. I know, I can compose the filter based upon the user input and send it as a string to the corresponding stored procedure as a parameter, build the query with that filter and finally execute the query string with the help of EXECUTE IMMEDIATE(In Oracle's case). Don't know why but I really don't like this way of query building. I think this way I leave the doors open for SQL injectors. And besides, that I always have trouble with the query itself as everything is just a string and I need to handle dates and numbers carefully.What is the best and most used method of forming the WHERE clause of a query against a database table?

Using database parameters instead of attempting to quote your literals is the way forward.
This will guard you against SQL injection.

A common way of approaching this problem is building expression trees that represent your query criteria, converting them to parameterized SQL (to avoid SQL injection risks), binding parameter values to the generated SQL, and executing the resultant query against your target database.
The exact approach depends on your client programming framework: .NET has Entity Framework and LINQ2SQL that both support expression trees; Java has Hibernate and JPA, and so on. I have seen several different frameworks used to construct customizable queries, with great deal of success. In situations when these frameworks are not available, you can roll your own, although it requires a lot more work.

Related

How to make dynamic logical statements in memsql?

I am trying to figure out if there is a way of creating a query that is composed of dynamic logical statements (AND and OR operators) in a configurable and persistent manner.
say I want to make a set of events and bundle them under an entity called feature, so each feature is composed by events.
For example,
featureA is eventA and eventB,
featureB is (eventB and eventC) or eventD
I was suggesting:
making an S expression column, and save it under JSON column then parse it to query
creating the where clause by hand, then save it under a text column and run it later, with a view reflecting the data prettier
then, I realised I can't execute (like eval) stored strings as mentioned here.
so it comes down to what I was trying to avoid which is running and manipulating everything via client side querying. I needed a pure sql solution for further use by our data analysts.
Any suggestions?
You can execute dynamic SQL statements with https://docs.memsql.com/sql-reference/v6.7/execute-immediate/, see that page for some examples (prepared statements is a different topic, I don't think it is related to what you are looking for).
You may also be interested in https://docs.memsql.com/concepts/v6.7/persistent-computed-columns/, which allows you to define columns that are computed as sql expressions from other columns - so you could define your features this way.

Rails - find_by_sql vs active record querying

I would like to use the find_by_sql method in place of the active record query interface because of the flexibility I get in writing my own SQL.
For example,
It is considered a good practice in SQL to order the tables in your query from smallest to largest. When using the active record query interface, you will have to start with the result model which could be the largest table.
I can also easily avoid the N+1 problem by including the target table in the join itself and including the required columns from multiple tables.
I would like to know if there are reasons I should not be using the find_by_sql option, especially when I will be writing ANSI SQL that should be compatible with all, if not most databases.
Thanks!
Writing SQL code directly is normally not encouraged because it will prevent you from accessing features such as the lazy-execution of SQL queries and may potentially lead to invalid or unsafe queries.
In fact, ActiveRecord automatically converts Ruby objects into corresponding SQL statements (such as ranges into BETWEEN or arrays into IN), filters strings from SQL injections and ActiveRecord::Relations provide lazy query executions.
That said, if you know what you are doing or using ActiveRecord would be extremely complex to achieve a specific query, there is nothing wrong to write your SQL.
My suggestion is to avoid doing that for queries that can easily be reproduced in ActiveRecord and AREL.

writing search queries from an application

I need to search for an entity in my application.
The search panel consists of 8 fields and a user may fill any field he wants.
Will I have to write queries for all possible combinations or what is the right way to do this?
There are many ways to achieve this, though the easiest one is to use dynamic SQL.
This has an associated cost in readability, exposure to SQL Injection and query plan caching and optimization.
As for where you build up the dynamic SQL string - you could do this in a stored procedure if your database supports those - that would be my preference, as you could pass in parameters and have that bit of extra protection against SQL Injection.
You could also build it up in the application, but this would require you to be more careful about the data.
The basic technique would be to build up the different parts of the WHERE clause for the different search fields (possibly starting with WHERE 1 = 1 so you can just add AND clauses without needing to check each clause to see if it is the first one).
I suggest reading through the dynamic SQL article I linked to.
The classic way to do this is
SELECT
<fields>
FROM tablename
WHERE
(field1 like '%[field1 user input]%' OR '[field1 user input]'='')
AND (field2 like '%[field2 user input]%' OR '[field2 user input]'='')
...
AND (field8 like '%[field8 user input]%' OR '[field8 user input]'='')
Every sane query optimizer will optimize away the unneeded conditions

NHibernate problem choosing between CreateSql and CreateCriteria

I have a very silly doubt in NHibernate. There are two or three entities of which two are related and one is not related to other two entities. I have to fetch some selected columns from these three tables by joining them. Is it a good idea to use session.CreateSql() or we have to use session.CreateCriteria(). I am really confused here as I could not write the Criteria queries here and forced to use CreateSql. Please advise.
in general you should avoid writing SQL whenever possible;
one of the advantages of using an ORM is that it's implementation-agnostic.
that means that you don't know (and don't care) what the underlying database is, and you can actually switch DB providers or tweak with the DB structure very easily.
If you write your own SQL statements you run the risk of them not working on other providers, and also you have to maintain them yourself (for example- if you change the name of the underlying column for the Id property from 'Id' to 'Employee_Id', you'd have to change your SQL query, whereas with Criteria no change would be necessary).
Having said that- there's nothing stopping you from writing a Criteria / HQL that pulls data from more than one table. for example (with HQL):
select emp.Id, dep.Name, po.Id
from Employee emp, Department dep, Posts po
where emp.Name like 'snake' //etc...
There are multiple ways to make queries with NH.
HQL, the classic way, a powerful object oriented query language. Disadvantage: appears in strings in the code (actually: there is no editor support).
Criteria, a classic way to create dynamic queries without string manipulations. Disadvantages: not as powerful as HQL and not as typesafe as its successors.
QueryOver, a successor of Criteria, which has a nicer syntax and is more type safe.
LINQ, now based on HQL, is more integrated then HQL and typesafe and generally a matter of taste.
SQL as a fallback for cases where you need something you can't get the object oriented way.
I would recommend HQL or LINQ for regular queries, QueryOver (resp. Criteria) for dynamic queries and SQL only if there isn't any other way.
To answer your specific problem, which I don't know: If all information you need for the query is available in the object oriented model, you should be able to solve it by the use of HQL.

How to create dynamic and safe queries

A "static" query is one that remains the same at all times. For example, the "Tags" button on Stackoverflow, or the "7 days" button on Digg. In short, they always map to a specific database query, so you can create them at design time.
But I am trying to figure out how to do "dynamic" queries where the user basically dictates how the database query will be created at runtime. For example, on Stackoverflow, you can combine tags and filter the posts in ways you choose. That's a dynamic query albeit a very simple one since what you can combine is within the world of tags. A more complicated example is if you could combine tags and users.
First of all, when you have a dynamic query, it sounds like you can no longer use the substitution api to avoid sql injection since the query elements will depend on what the user decided to include in the query. I can't see how else to build this query other than using string append.
Secondly, the query could potentially span multiple tables. For example, if SO allows users to filter based on Users and Tags, and these probably live in two different tables, building the query gets a bit more complicated than just appending columns and WHERE clauses.
How do I go about implementing something like this?
The first rule is that users are allowed to specify values in SQL expressions, but not SQL syntax. All query syntax should be literally specified by your code, not user input. The values that the user specifies can be provided to the SQL as query parameters. This is the most effective way to limit the risk of SQL injection.
Many applications need to "build" SQL queries through code, because as you point out, some expressions, table joins, order by criteria, and so on depend on the user's choices. When you build a SQL query piece by piece, it's sometimes difficult to ensure that the result is valid SQL syntax.
I worked on a PHP class called Zend_Db_Select that provides an API to help with this. If you like PHP, you could look at that code for ideas. It doesn't handle any query imaginable, but it does a lot.
Some other PHP database frameworks have similar solutions.
Though not a general solution, here are some steps that you can take to mitigate the dynamic yet safe query issue.
Criteria in which a column value belongs in a set of values whose cardinality is arbitrary does not need to be dynamic. Consider using either the instr function or the use of a special filtering table in which you join against. This approach can be easily extended to multiple columns as long as the number of columns is known. Filtering on users and tags could easily be handled with this approach.
When the number of columns in the filtering criteria is arbitrary yet small, consider using different static queries for each possibility.
Only when the number of columns in the filtering criteria is arbitrary and potentially large should you consider using dynamic queries. In which case...
To be safe from SQL injection, either build or obtain a library that defends against that attack. Though more difficult, this is not an impossible task. This is mostly about escaping SQL string delimiters in the values to filter for.
To be safe from expensive queries, consider using views that are specially crafted for this purpose and some up front logic to limit how those views will get invoked. This is the most challenging in terms of developer time and effort.
If you were using python to access your database, I would suggest you use the Django model system. There are many similar apis both for python and for other languages (notably in ruby on rails). I am saving so much time by avoiding the need to talk directly to the database with SQL.
From the example link:
#Model definition
class Blog(models.Model):
name = models.CharField(max_length=100)
tagline = models.TextField()
def __unicode__(self):
return self.name
Model usage (this is effectively an insert statement)
from mysite.blog.models import Blog
b = Blog(name='Beatles Blog', tagline='All the latest Beatles news.')
b.save()
The queries get much more complex - you pass around a query object and you can add filters / sort elements to it. When you finally are ready to use the query, Django creates an SQL statment that reflects all the ways you adjusted the query object. I think that it is very cute.
Other advantages of this abstraction
Your models can be created as database tables with foreign keys and constraints by Django
Many databases are supported (Postgresql, Mysql, sql lite, etc)
DJango analyses your templates and creates an automatic admin site out of them.
Well the options have to map to something.
A SQL query string CONCAT isn't a problem if you still use parameters for the options.