I need to search for an entity in my application.
The search panel consists of 8 fields and a user may fill any field he wants.
Will I have to write queries for all possible combinations or what is the right way to do this?
There are many ways to achieve this, though the easiest one is to use dynamic SQL.
This has an associated cost in readability, exposure to SQL Injection and query plan caching and optimization.
As for where you build up the dynamic SQL string - you could do this in a stored procedure if your database supports those - that would be my preference, as you could pass in parameters and have that bit of extra protection against SQL Injection.
You could also build it up in the application, but this would require you to be more careful about the data.
The basic technique would be to build up the different parts of the WHERE clause for the different search fields (possibly starting with WHERE 1 = 1 so you can just add AND clauses without needing to check each clause to see if it is the first one).
I suggest reading through the dynamic SQL article I linked to.
The classic way to do this is
SELECT
<fields>
FROM tablename
WHERE
(field1 like '%[field1 user input]%' OR '[field1 user input]'='')
AND (field2 like '%[field2 user input]%' OR '[field2 user input]'='')
...
AND (field8 like '%[field8 user input]%' OR '[field8 user input]'='')
Every sane query optimizer will optimize away the unneeded conditions
Related
If we take a database table, we can query all the rows or we can choose to apply a filter on it. The filter can vary depending on the user input. In cases when there are few options we can specify different queries for those few specific conditions. But if there are lots and lots of options that user might or might not specify, aforementioned method does not come handy. I know, I can compose the filter based upon the user input and send it as a string to the corresponding stored procedure as a parameter, build the query with that filter and finally execute the query string with the help of EXECUTE IMMEDIATE(In Oracle's case). Don't know why but I really don't like this way of query building. I think this way I leave the doors open for SQL injectors. And besides, that I always have trouble with the query itself as everything is just a string and I need to handle dates and numbers carefully.What is the best and most used method of forming the WHERE clause of a query against a database table?
Using database parameters instead of attempting to quote your literals is the way forward.
This will guard you against SQL injection.
A common way of approaching this problem is building expression trees that represent your query criteria, converting them to parameterized SQL (to avoid SQL injection risks), binding parameter values to the generated SQL, and executing the resultant query against your target database.
The exact approach depends on your client programming framework: .NET has Entity Framework and LINQ2SQL that both support expression trees; Java has Hibernate and JPA, and so on. I have seen several different frameworks used to construct customizable queries, with great deal of success. In situations when these frameworks are not available, you can roll your own, although it requires a lot more work.
A specification essentially is a text string representing a "where" clause created by an end user.
I have stored procedures that copy a set of related tables and records to other places. The operation is always the same, but dependent on some crazy user requirements like "products that are frozen and blue and on sale on Tuesday".
What if we fed the user specification (or string parameter) to a scalar function that returned true/false which executed the specification as dynamic SQL or just exec (#variable).
It could tell us whether those records exist. We could add the result of the function to our copy products where clause.
It would keep us from recompiling the copy script each time our where clauses changed. Plus it would isolate the product selection in to a single function.
Anyone ever do anything like this or have examples? What bad things could come of it?
EDIT:
This is the specification I simply added to the end of each insert/select statement:
and exists (
select null as nothing
from SameTableAsOutsideTable inside
where inside.ID = outside.id and -- Join operations to outside table
inside.page in (6, 7) and -- Criteria 1
inside.dept in (7, 6, 2, 4) -- Criteria 2
)
It would be great to feed a parameter into a function that produces records based on the user criteria, so all that above could be something like:
and dbo.UserCriteria( #page="6,7", #dept="7,6,2,4")
Dynamic Search Conditions in T-SQL
When optimizing SQL the important thing is optimizing the access path to data (ie. index usage). This trumps code reuse, maintainability, nice formatting and just about every other development perk you can think of. This is because a bad access path will cause the query to perform hundreds of times slower than it should. The article linked sums up very well all the options you have, and your envisioned function is nowhere on the radar. Your options will gravitate around dynamic SQL or very complicated static queries. I'm afraid there is no free lunch on this topic.
It doesn't sound like a very good idea to me. Even supposing that you had proper defensive coding to avoid SQL injection attacks it's not going to really buy you anything. The code still needs to be "compiled" each time.
Also, it's pretty much always a bad idea to let users create free-form WHERE clauses. Users are pretty good at finding new and innovative ways to bring a server to a grinding halt.
If you or your users or someone else in the business can't come up with some concrete search requirements then it's likely that someone isn't thinking about it hard enough and doesn't really know what they want. You can have pretty versatile search capabilities without letting the users completely loose on the system. Alternatively, look at some of the BI tools out there and consider creating a data mart where they can do these kinds of ad hoc searches.
How about this:
You create another store procedure (instead of function) and pass the right condition to it.
Based on that condition it dumps the record ids to a temp table.
Next you move procedure will read ids from that table and do the needful things?
Or you could create a user function that returns a table which is nothing but the ids of the records that matches your criteria (dynamic)
If I am totally off, then please clarify me.
Hope this helps.
If you are forced to use dynamic queries and you don't have any solid and predefined search requirements, it is strongly recommended to use sp_executesql instead of EXEC . It provides parametrized queries to prevent SQL Injection attacks (to some extent) and It makes use of execution plans to speed up performance. (More info)
I'm working on a product which gives users a lot of "flexibility" to create sql, ie they can easily set up queries that can bring the system to it's knees with over inclusive where clauses.
I would like to be able to warn users when this is potentially the case and I'm wondering if there is any known strategy for intelligently analysing queries which can be employed to this end?
I feel your pain. I've been tasked with something similar in the past. It's a constant struggle between users demanding all of the features and functionality of SQL while also complaining that it's too complicated, doesn't help them, doesn't prevent them from doing stupid stuff.
Adding paging into the query won't stop bad queries from being executed, but it will reduce the damage. If you only show the first 50 records returned from SELECT * FROM UNIVERSE and provide the ability to page to the next 50 and so on and so forth, you can avoid out of memory issues and reduce the performance hit.
I don't know if it's appropriate for your data/business domain; but I forcefully add table joins when the user doesn't supply them. If the query contains TABLE A and TABLE B, A.ID needs to equal B.ID; I add it.
If you don't mind writing code that is specific to a database, I know you can get data about a query from the database (Explain Plan in Oracle - http://www.adp-gmbh.ch/ora/explainplan.html). You can execute the plan on their query first, and use the results of that to prompt or warn the user. But the details will vary depending on which DB you are working with.
I am working on a problem that I'm certain someone has seen before, but all I found across the net was how not to do it.
Fake table example and dynamic searching.
(Due to my low rating I cannot post images. I know I should be ashamed!!)
Clicking the add button automatically creates another row for adding more criteria choices.
(Note: My table is most definitely more complex)
Now to my issue, I thought I knew how to handle the SQL for this task, but I really don't. The only examples of what I should do are not meant for this sort of dynamic table querying. The examples didn't have the ability to create as as many search filters as a user pleases (or perhaps my understanding was lacking).
Please let me know if my uploaded image is not of good enough quality or if I have not given enough information.
I'm really curious about the best practice for this situation. Thank you in advance.
I had a similar question. You can use dynamic sql with the sp_executesql stored proc where you actually build your select statement as a string and pass it in.
Or you might be able to write a stored proc kinda like the one I created where you have all of the conditions in the where clause but the NULL values are ignored.
Here's the stored proc I came up with for my scenario:
How do I avoid dynamic SQL when using an undetermined number of parameters?
The advantage with the parameterized stored proc I wrote is that I'm able to avoid the SQL injection risks associated with dynamic SQL.
Two main choices:
Linq to Sql allows you to compose a query, add to it, add to it again, and it won't actually compile and execute a SQL statement until you iterate the results.
Or you can use dynamic SQL. The trick to making this easy is the "WHERE (1=1)" technique, but you do have to be careful to use parameters (to avoid SQL injection attacks) and build your sql statements carefully.
The original post:
Write a sql for searching with multiple conditions
select * from thetable
where (#name='' or [name]=#name) and (#age=0 or age=#age)
However, the above query forces table scan. For better performance and more complex scenario (I guess you simplified the question in you original post), consider use dynamic sql. By the way, Linq to SQL can help you build dynamic SQL very easily, like the following:
IQueryable<Person> persons = db.Persons;
if (!string.IsNullOrEmpty(name)) persons = persons.Where(p=>p.Name==name);
if (age != 0) persons = persons.Where(p=>p.Age=age);
Check out SqlBuilder, a utility for Dynamic SQL.
A "static" query is one that remains the same at all times. For example, the "Tags" button on Stackoverflow, or the "7 days" button on Digg. In short, they always map to a specific database query, so you can create them at design time.
But I am trying to figure out how to do "dynamic" queries where the user basically dictates how the database query will be created at runtime. For example, on Stackoverflow, you can combine tags and filter the posts in ways you choose. That's a dynamic query albeit a very simple one since what you can combine is within the world of tags. A more complicated example is if you could combine tags and users.
First of all, when you have a dynamic query, it sounds like you can no longer use the substitution api to avoid sql injection since the query elements will depend on what the user decided to include in the query. I can't see how else to build this query other than using string append.
Secondly, the query could potentially span multiple tables. For example, if SO allows users to filter based on Users and Tags, and these probably live in two different tables, building the query gets a bit more complicated than just appending columns and WHERE clauses.
How do I go about implementing something like this?
The first rule is that users are allowed to specify values in SQL expressions, but not SQL syntax. All query syntax should be literally specified by your code, not user input. The values that the user specifies can be provided to the SQL as query parameters. This is the most effective way to limit the risk of SQL injection.
Many applications need to "build" SQL queries through code, because as you point out, some expressions, table joins, order by criteria, and so on depend on the user's choices. When you build a SQL query piece by piece, it's sometimes difficult to ensure that the result is valid SQL syntax.
I worked on a PHP class called Zend_Db_Select that provides an API to help with this. If you like PHP, you could look at that code for ideas. It doesn't handle any query imaginable, but it does a lot.
Some other PHP database frameworks have similar solutions.
Though not a general solution, here are some steps that you can take to mitigate the dynamic yet safe query issue.
Criteria in which a column value belongs in a set of values whose cardinality is arbitrary does not need to be dynamic. Consider using either the instr function or the use of a special filtering table in which you join against. This approach can be easily extended to multiple columns as long as the number of columns is known. Filtering on users and tags could easily be handled with this approach.
When the number of columns in the filtering criteria is arbitrary yet small, consider using different static queries for each possibility.
Only when the number of columns in the filtering criteria is arbitrary and potentially large should you consider using dynamic queries. In which case...
To be safe from SQL injection, either build or obtain a library that defends against that attack. Though more difficult, this is not an impossible task. This is mostly about escaping SQL string delimiters in the values to filter for.
To be safe from expensive queries, consider using views that are specially crafted for this purpose and some up front logic to limit how those views will get invoked. This is the most challenging in terms of developer time and effort.
If you were using python to access your database, I would suggest you use the Django model system. There are many similar apis both for python and for other languages (notably in ruby on rails). I am saving so much time by avoiding the need to talk directly to the database with SQL.
From the example link:
#Model definition
class Blog(models.Model):
name = models.CharField(max_length=100)
tagline = models.TextField()
def __unicode__(self):
return self.name
Model usage (this is effectively an insert statement)
from mysite.blog.models import Blog
b = Blog(name='Beatles Blog', tagline='All the latest Beatles news.')
b.save()
The queries get much more complex - you pass around a query object and you can add filters / sort elements to it. When you finally are ready to use the query, Django creates an SQL statment that reflects all the ways you adjusted the query object. I think that it is very cute.
Other advantages of this abstraction
Your models can be created as database tables with foreign keys and constraints by Django
Many databases are supported (Postgresql, Mysql, sql lite, etc)
DJango analyses your templates and creates an automatic admin site out of them.
Well the options have to map to something.
A SQL query string CONCAT isn't a problem if you still use parameters for the options.