I built a prototype system with some database queries. Since it was just a prototype and I was very new to databases, I just used a direct string. This is the string I used and it works fine:
command = New OleDbCommand("SELECT * FROM " + prefix + "CanonicForms WHERE Type=1 AND Canonic_Form='" + item + "'", dictionary_connection)
Now, in putting it in a real system, I wanted to use the more secure parametized method, so after some googling I came up with this:
command = New OleDbCommand("SELECT * FROM #prefix WHERE Type=1 AND Canonic_Form=#form", dictionary_connection)
command.Parameters.AddWithValue("#prefix", prefix + "CanonicForms")
command.Parameters.AddWithValue("#form", item)
But all I get is an error for an incomplete query clause. What have I done differently between the two?
Your table name can't be a parameter. You might have to do some form of concatenation. It's not really a parameter in the formal sense of the word.
As ek_ny states, your table name can not be a parameter.
If you are really paranoid about injection then check the passed in table name against a white list of allowable values prior to buidling up the SQL string.
#cost, I agree that it would be nice but its just not a feature that exists. Generally its assumed that your WHERE clauses are the dynamic portion of the query and everything else, including the SELECT list and the table name, are static. In fact, most of the WHERE clause isn't even dynamic, only the search values themselves. You can't dynamically add column names this way either. This could definitely be built but it would require that the query engine be more aware of the database engine which is a can of worms that Microsoft didn't feel like opening.
Imagine a drop down menu with table names such as 'Jobs', 'People', 'Employers' that some naive developer built. On postback that value is used to SELECT * FROM. A simple SQL injection would be all that it takes to jump to another table not listed in that menu. And passing something really weird could very easily throw an error that could reveal something about the database. Value-only parametrized queries can still be broken but the surface area is much smaller.
Some people with multiple table prefixes use schemas, is that an option for you?
Related
I saw a query run in a log file on an application. and it contained a query like:
SELECT ID FROM CUST_ATTR49 WHERE 1=0
what is the use of such a query that is bound to return nothing?
A query like this can be used to ping the database. The clause:
WHERE 1=0
Ensures that non data is sent back, so no CPU charge, no Network traffic or other resource consumption.
A query like that can test for:
server availability
CUST_ATTR49 table existence
ID column existence
Keeping a connection alive
Cause a trigger to fire without changing any rows (with the where clause, but not in a select query)
manage many OR conditions in dynamic queries (e.g WHERE 1=0 OR <condition>)
This may be also used to extract the table schema from a table without extracting any data inside that table. As Andrea Colleoni said those will be the other benefits of using this.
A usecase I can think of: you have a filter form where you don't want to have any search results. If you specify some filter, they get added to the where clause.
Or it's usually used if you have to create a sql query by hand. E.g. you don't want to check whether the where clause is empty or not..and you can just add stuff like this:
where := "WHERE 0=1"
if X then where := where + " OR ... "
if Y then where := where + " OR ... "
(if you connect the clauses with OR you need 0=1, if you have AND you have 1=1)
As an answer - but also as further clarification to what #AndreaColleoni already mentioned:
manage many OR conditions in dynamic queries (e.g WHERE 1=0 OR <condition>)
Purpose as an on/off switch
I am using this as a switch (on/off) statement for portions of my Query.
If I were to use
WHERE 1=1
AND (0=? OR first_name = ?)
AND (0=? OR last_name = ?)
Then I can use the first bind variable (?) to turn on or off the first_name search criterium. , and the third bind variable (?) to turn on or off the last_name criterium.
I have also added a literal 1=1 just for esthetics so the text of the query aligns nicely.
For just those two criteria, it does not appear that helpful, as one might thing it is just easier to do the same by dynamically building your WHERE condition by either putting only first_name or last_name, or both, or none. So your code will have to dynamically build 4 versions of the same query. Imagine what would happen if you have 10 different criteria to consider, then how many combinations of the same query will you have to manage then?
Compile Time Optimization
I also might add that adding in the 0=? as a bind variable switch will not work very well if all your criteria are indexed. The run time optimizer that will select appropriate indexes and execution plans, might just not see the cost benefit of using the index in those slightly more complex predicates. Hence I usally advice, to inject the 0 / 1 explicitly into your query (string concatenating it in in your sql, or doing some search/replace). Doing so will give the compiler the chance to optimize out redundant statements, and give the Runtime Executer a much simpler query to look at.
(0=1 OR cond = ?) --> (cond = ?)
(0=0 OR cond = ?) --> Always True (ignore predicate)
In the second statement above the compiler knows that it never has to even consider the second part of the condition (cond = ?), and it will simply remove the entire predicate. If it were a bind variable, the compiler could never have accomplished this.
Because you are simply, and forcedly, injecting a 0/1, there is zero chance of SQL injections.
In my SQL's, as one approach, I typically place my sql injection points as ${literal_name}, and I then simply search/replace using a regex any ${...} occurrence with the appropriate literal, before I even let the compiler have a stab at it. This basically leads to a query stored as follows:
WHERE 1=1
AND (0=${cond1_enabled} OR cond1 = ?)
AND (0=${cond2_enabled} OR cond2 = ?)
Looks good, easily understood, the compiler handles it well, and the Runtime Cost Based Optimizer understands it better and will have a higher likelihood of selecting the right index.
I take special care in what I inject. Prime way for passing variables is and remains bind variables for all the obvious reasons.
This is very good in metadata fetching and makes thing generic.
Many DBs have optimizer so they will not actually execute it but its still a valid SQL statement and should execute on all DBs.
This will not fetch any result, but you know column names are valid, data types etc. If it does not execute you know something is wrong with DB(not up etc.)
So many generic programs execute this dummy statement for testing and fetching metadata.
Some systems use scripts and can dynamically set selected records to be hidden from a full list; so a false condition needs to be passed to the SQL. For example, three records out of 500 may be marked as Privacy for medical reasons and should not be visible to everyone. A dynamic query will control the 500 records are visible to those in HR, while 497 are visible to managers. A value would be passed to the SQL clause that is conditionally set, i.e. ' WHERE 1=1 ' or ' WHERE 1=0 ', depending who is logged into the system.
quoted from Greg
If the list of conditions is not known at compile time and is instead
built at run time, you don't have to worry about whether you have one
or more than one condition. You can generate them all like:
and
and concatenate them all together. With the 1=1 at the start, the
initial and has something to associate with.
I've never seen this used for any kind of injection protection, as you
say it doesn't seem like it would help much. I have seen it used as an
implementation convenience. The SQL query engine will end up ignoring
the 1=1 so it should have no performance impact.
Why would someone use WHERE 1=1 AND <conditions> in a SQL clause?
If the user intends to only append records, then the fastest method is open the recordset without returning any existing records.
It can be useful when only table metadata is desired in an application. For example, if you are writing a JDBC application and want to get the column display size of columns in the table.
Pasting a code snippet here
String query = "SELECT * from <Table_name> where 1=0";
PreparedStatement stmt = connection.prepareStatement(query);
ResultSet rs = stmt.executeQuery();
ResultSetMetaData rsMD = rs.getMetaData();
int columnCount = rsMD.getColumnCount();
for(int i=0;i<columnCount;i++) {
System.out.println("Column display size is: " + rsMD.getColumnDisplaySize(i+1));
}
Here having a query like "select * from table" can cause performance issues if you are dealing with huge data because it will try to fetch all the records from the table. Instead if you provide a query like "select * from table where 1=0" then it will fetch only table metadata and not the records so it will be efficient.
Per user milso in another thread, another purpose for "WHERE 1=0":
CREATE TABLE New_table_name as select * FROM Old_table_name WHERE 1 =
2;
this will create a new table with same schema as old table. (Very
handy if you want to load some data for compares)
An example of using a where condition of 1=0 is found in the Northwind 2007 database. On the main page the New Customer Order and New Purchase Order command buttons use embedded macros with the Where Condition set to 1=0. This opens the form with a filter that forces the sub-form to display only records related to the parent form. This can be verified by opening either of those forms from the tree without using the macro. When opened this way all records are displayed by the sub-form.
In ActiveRecord ORM, part of RubyOnRails:
Post.where(category_id: []).to_sql
# => SELECT * FROM posts WHERE 1=0
This is presumably because the following is invalid (at least in Postgres):
select id FROM bookings WHERE office_id IN ()
It seems like, that someone is trying to hack your database. It looks like someone tried mysql injection. You can read more about it here: Mysql Injection
I'm here to share a consolidated analysis for the following scenario:
I've an 'Item' table and I've a search SP for it. I want to be able to search for multiple ItemCodes like:
- Table structure : Item(Id INT, ItemCode nvarchar(20))
- Filter query format: SELECT * FROM Item WHERE ItemCode IN ('xx','yy','zz')
I want to do this dynamically using stored procedure. I'll pass an #ItemCodes parameter which will have comma(',') separated values and the search shud be performed as above.
Well, I've already visited lot of posts\forums and here're some threads:
Dynamic SQL might be a least complex way but I don't want to consider it because of the parameters like performance,security (SQL-Injection, etc..)..
Also other approaches like XML, etc.. if they make things complex I can't use them.
And finally, no extra temp-table JOIN kind of performance hitting tricks please.
I've to manage the performance as well as the complexity.
T-SQL stored procedure that accepts multiple Id values
Passing an "in" list via stored procedure
I've reviewed the above two posts and gone thru some solutions provided, here're some limitations:
http://www.sommarskog.se/arrays-in-sql-2005.html
This will require me to 'declare' the parameter-type while passing it to the SP, it distorts the abstraction (I don't set type in any of my parameters because each of them is treated in a generic way)
http://www.sqlteam.com/article/sql-server-2008-table-valued-parameters
This is a structured approach but it increases complexity, required DB-structure level changes and its not abstract as above.
http://madprops.org/blog/splitting-text-into-words-in-sql-revisited/
Well, this seems to match-up with my old solutions. Here's what I did in the past -
I created an SQL function : [GetTableFromValues] (returns a temp table populated each item (one per row) from the comma separated #ItemCodes)
And, here's how I use it in my WHERE caluse filter in SP -
SELECT * FROM Item WHERE ItemCode in (SELECT * FROM[dbo].[GetTableFromValues](#ItemCodes))
This one is reusable and looks simple and short (comparatively of course). Anything I've missed or any expert with a better solution (obviously 'within' the limitations of the above mentioned points).
Thank you.
I think using dynamic T-SQL will be pragmatic in this scenario. If you are careful with the design, dynamic sql works like a charm. I have leveraged it in countless projects when it was the right fit. With that said let me address your two main concerns - performance and sql injection.
With regards to performance, read T-SQL reference on parameterized dynamic sql and sp_executesql (instead of sp_execute). A combination of parameterized sql and using sp_executesql will get you out of the woods on performance by ensuring that query plans are reused and sp_recompiles avoided! I have used dynamic sql even in real-time contexts and it works like a charm with these two items taken care of. For your satisfaction you can run a loop of million or so calls to the sp with and without the two optimizations, and use sql profiler to track sp_recompile events.
Now, about SQL-injection. This will be an issue if you use an incorrect user widget such as a textbox to allow the user to input the item codes. In that scenario it is possible that a hacker may write select statements and try to extract information about your system. You can write code to prevent this but I think going down that route is a trap. Instead consider using an appropriate user widget such as a listbox (depending on your frontend platform) that allows multiple selection. In this case the user will just select from a list of "presented items" and your code will generate the string containing the corresponding item codes. Basically you do not pass user text to the dynamic sql sp! You can even use slicker JQuery based selection widgets but the bottom line is that the user does not get to type any unacceptable text that hits your data layer.
Next, you just need a simple stored procedure on the database that takes a param for the itemcodes (for e.g. '''xyz''','''abc'''). Internally it should use sp_executesql with a parameterized dynamic query.
I hope this helps.
-Tabrez
I have a query that I would like to filter in different ways at different times. The way I have done this right now by placing parameters in the criteria field of the relevant query fields, however there are many cases in which I do not want to filter on a given field but only on the other fields. Is there any way in which a wildcard of some sort can be passed to the criteria parameter so that I can bypass the filtering for that particular call of the query?
If you construct your query like so:
PARAMETERS ParamA Text ( 255 );
SELECT t.id, t.topic_id
FROM SomeTable t
WHERE t.id Like IIf(IsNull([ParamA]),"*",[ParamA])
All records will be selected if the parameter is not filled in.
Note the * wildcard with the LIKE keyword will only have the desired effect in ANSI-89 Query Mode.
Many people mistakenly assume the wildcard character in Access/Jet is always *. Not so. Jet has two wildcards: % in ANSI-92 Query Mode and * in ANSI-89 Query Mode.
ADO is always ANSI-92 and DAO is always ANSI-89 but the Access interface can be either.
When using the LIKE keyword in a database object (i.e. something that will be persisted in the mdb file), you should to think to yourself: what would happen if someone used this database using a Query Mode other than the one I usually use myself? Say you wanted to restrict a text field to numeric characters only and you'd written your Validation Rule like this:
NOT LIKE "*[!0-9]*"
If someone unwittingly (or otherwise) connected to your .mdb via ADO then the validation rule above would allow them to add data with non-numeric characters and your data integrity would be shot. Not good.
Better IMO to always code for both ANSI Query Modes. Perhaps this is best achieved by explicitly coding for both Modes e.g.
NOT LIKE "*[!0-9]*" AND NOT LIKE "%[!0-9]%"
But with more involved Jet SQL DML/DDL, this can become very hard to achieve concisely. That is why I recommend using the ALIKE keyword, which uses the ANSI-92 Query Mode wildcard character regardless of Query Mode e.g.
NOT ALIKE "%[!0-9]%"
Note ALIKE is undocumented (and I assume this is why my original post got marked down). I've tested this in Jet 3.51 (Access97), Jet 4.0 (Access2000 to 2003) and ACE (Access2007) and it works fine. I've previously posted this in the newsgroups and had the approval of Access MVPs. Normally I would steer clear of undocumented features myself but make an exception in this case because Jet has been deprecated for nearly a decade and the Access team who keep it alive don't seem interested in making deep changes to the engines (or bug fixes!), which has the effect of making the Jet engine a very stable product.
For more details on Jet's ANSI Query modes, see About ANSI SQL query mode.
Back to my previous exampe in your previous question. Your parameterized query is a string looking like that:
qr = "Select Tbl_Country.* From Tbl_Country WHERE id_Country = [fid_country]"
depending on the nature of fid_Country (number, text, guid, date, etc), you'll have to replace it with a joker value and specific delimitation characters:
qr = replace(qr,"[fid_country]","""*""")
In order to fully allow wild cards, your original query could also be:
qr = "Select Tbl_Country.* From Tbl_Country _
WHERE id_Country LIKE [fid_country]"
You can then get wild card values for fid_Country such as
qr = replace(qr,"[fid_country]","G*")
Once you're done with that, you can use the string to open a recordset
set rs = currentDb.openRecordset(qr)
I don't think you can. How are you running the query?
I'd say if you need a query that has that many open variables, put it in a vba module or class, and call it, letting it build the string every time.
I'm not sure this helps, because I suspect you want to do this with a saved query rather than in VBA; however, the easiest thing you can do is build up a query line by line in VBA, and then creating a recordset from it.
A quite hackish way would be to re-write the saved query on the fly and then access that; however, if you have multiple people using the same DB you might run into conflicts, and you'll confuse the next developer down the line.
You could also programatically pass default value to the query (as discussed in you r previous question)
Well, you can return non-null values by passing * as the parameter for fields you don't wish to use in the current filter. In Access 2003 (and possibly earlier and later versions), if you are using like [paramName] as your criterion for a numeric, Text, Date, or Boolean field, an asterisk will display all records (that match the other criteria you specify). If you want to return null values as well, then you can use like [paramName] or Is Null as the criterion so that it returns all records. (This works best if you are building the query in code. If you are using an existing query, and you don't want to return null values when you do have a value for filtering, this won't work.)
If you're filtering a Memo field, you'll have to try another approach.
If I remove all the ' characters from a SQL query, is there some other way to do a SQL injection attack on the database?
How can it be done? Can anyone give me examples?
Yes, there is. An excerpt from Wikipedia
"SELECT * FROM data WHERE id = " + a_variable + ";"
It is clear from this statement that the author intended a_variable to be a number correlating to the "id" field. However, if it is in fact a string then the end user may manipulate the statement as they choose, thereby bypassing the need for escape characters. For example, setting a_variable to
1;DROP TABLE users
will drop (delete) the "users" table from the database, since the SQL would be rendered as follows:
SELECT * FROM DATA WHERE id=1;DROP TABLE users;
SQL injection is not a simple attack to fight. I would do very careful research if I were you.
Yes, depending on the statement you are using. You are better off protecting yourself either by using Stored Procedures, or at least parameterised queries.
See Wikipedia for prevention samples.
I suggest you pass the variables as parameters, and not build your own SQL. Otherwise there will allways be a way to do a SQL injection, in manners that we currently are unaware off.
The code you create is then something like:
' Not Tested
var sql = "SELECT * FROM data WHERE id = #id";
var cmd = new SqlCommand(sql, myConnection);
cmd.Parameters.AddWithValue("#id", request.getParameter("id"));
If you have a name like mine with an ' in it. It is very annoying that all '-characters are removed or marked as invalid.
You also might want to look at this Stackoverflow question about SQL Injections.
Yes, it is definitely possible.
If you have a form where you expect an integer to make your next SELECT statement, then you can enter anything similar:
SELECT * FROM thingy WHERE attributeID=
5 (good answer, no problem)
5; DROP table users; (bad, bad, bad...)
The following website details further classical SQL injection technics: SQL Injection cheat sheet.
Using parametrized queries or stored procedures is not any better. These are just pre-made queries using the passed parameters, which can be source of injection just as well. It is also described on this page: Attacking Stored Procedures in SQL.
Now, if you supress the simple quote, you prevent only a given set of attack. But not all of them.
As always, do not trust data coming from the outside. Filter them at these 3 levels:
Interface level for obvious stuff (a drop down select list is better than a free text field)
Logical level for checks related to data nature (int, string, length), permissions (can this type of data be used by this user at this page)...
Database access level (escape simple quote...).
Have fun and don't forget to check Wikipedia for answers.
Parameterized inline SQL or parameterized stored procedures is the best way to protect yourself. As others have pointed out, simply stripping/escaping the single quote character is not enough.
You will notice that I specifically talk about "parameterized" stored procedures. Simply using a stored procedure is not enough either if you revert to concatenating the procedure's passed parameters together. In other words, wrapping the exact same vulnerable SQL statement in a stored procedure does not make it any safer. You need to use parameters in your stored procedure just like you would with inline SQL.
Also- even if you do just look for the apostrophe, you don't want to remove it. You want to escape it. You do that by replacing every apostrophe with two apostrophes.
But parameterized queries/stored procedures are so much better.
Since this a relatively older question, I wont bother writing up a complete and comprehensive answer, since most aspects of that answer have been mentioned here by one poster or another.
I do find it necessary, however, to bring up another issue that was not touched on by anyone here - SQL Smuggling. In certain situations, it is possible to "smuggle" the quote character ' into your query even if you tried to remove it. In fact, this may be possible even if you used proper commands, parameters, Stored Procedures, etc.
Check out the full research paper at http://www.comsecglobal.com/FrameWork/Upload/SQL_Smuggling.pdf (disclosure, I was the primary researcher on this) or just google "SQL Smuggling".
. . . uh about 50000000 other ways
maybe somthing like 5; drop table employees; --
resulting sql may be something like:
select * from somewhere where number = 5; drop table employees; -- and sadfsf
(-- starts a comment)
Yes, absolutely: depending on your SQL dialect and such, there are many ways to achieve injection that do not use the apostrophe.
The only reliable defense against SQL injection attacks is using the parameterized SQL statement support offered by your database interface.
Rather that trying to figure out which characters to filter out, I'd stick to parametrized queries instead, and remove the problem entirely.
It depends on how you put together the query, but in essence yes.
For example, in Java if you were to do this (deliberately egregious example):
String query = "SELECT name_ from Customer WHERE ID = " + request.getParameter("id");
then there's a good chance you are opening yourself up to an injection attack.
Java has some useful tools to protect against these, such as PreparedStatements (where you pass in a string like "SELECT name_ from Customer WHERE ID = ?" and the JDBC layer handles escapes while replacing the ? tokens for you), but some other languages are not so helpful for this.
Thing is apostrophe's maybe genuine input and you have to escape them by doubling them up when you are using inline SQL in your code. What you are looking for is a regex pattern like:
\;.*--\
A semi colon used to prematurely end the genuine statement, some injected SQL followed by a double hyphen to comment out the trailing SQL from the original genuine statement. The hyphens may be omitted in the attack.
Therefore the answer is: No, simply removing apostrophes does not gaurantee you safety from SQL Injection.
I can only repeat what others have said. Parametrized SQL is the way to go. Sure, it is a bit of a pain in the butt coding it - but once you have done it once, then it isn't difficult to cut and paste that code, and making the modifications you need. We have a lot of .Net applications that allow web site visitors specify a whole range of search criteria, and the code builds the SQL Select statement on the fly - but everything that could have been entered by a user goes into a parameter.
When you are expecting a numeric parameter, you should always be validating the input to make sure it's numeric. Beyond helping to protect against injection, the validation step will make the app more user friendly.
If you ever receive id = "hello" when you expected id = 1044, it's always better to return a useful error to the user instead of letting the database return an error.
I'm looking for a pattern for performing a dynamic search on multiple tables.
I have no control over the legacy (and poorly designed) database table structure.
Consider a scenario similar to a resume search where a user may want to perform a search against any of the data in the resume and get back a list of resumes that match their search criteria. Any field can be searched at anytime and in combination with one or more other fields.
The actual sql query gets created dynamically depending on which fields are searched. Most solutions I've found involve complicated if blocks, but I can't help but think there must be a more elegant solution since this must be a solved problem by now.
Yeah, so I've started down the path of dynamically building the sql in code. Seems godawful. If I really try to support the requested ability to query any combination of any field in any table this is going to be one MASSIVE set of if statements. shiver
I believe I read that COALESCE only works if your data does not contain NULLs. Is that correct? If so, no go, since I have NULL values all over the place.
As far as I understand (and I'm also someone who has written against a horrible legacy database), there is no such thing as dynamic WHERE clauses. It has NOT been solved.
Personally, I prefer to generate my dynamic searches in code. Makes testing convenient. Note, when you create your sql queries in code, don't concatenate in user input. Use your #variables!
The only alternative is to use the COALESCE operator. Let's say you have the following table:
Users
-----------
Name nvarchar(20)
Nickname nvarchar(10)
and you want to search optionally for name or nickname. The following query will do this:
SELECT Name, Nickname
FROM Users
WHERE
Name = COALESCE(#name, Name) AND
Nickname = COALESCE(#nick, Nickname)
If you don't want to search for something, just pass in a null. For example, passing in "brian" for #name and null for #nick results in the following query being evaluated:
SELECT Name, Nickname
FROM Users
WHERE
Name = 'brian' AND
Nickname = Nickname
The coalesce operator turns the null into an identity evaluation, which is always true and doesn't affect the where clause.
Search and normalization can be at odds with each other. So probably first thing would be to get some kind of "view" that shows all the fields that can be searched as a single row with a single key getting you the resume. then you can throw something like Lucene in front of that to give you a full text index of those rows, the way that works is, you ask it for "x" in this view and it returns to you the key. Its a great solution and come recommended by joel himself on the podcast within the first 2 months IIRC.
What you need is something like SphinxSearch (for MySQL) or Apache Lucene.
As you said in your example lets imagine a Resume that will composed of several fields:
List item
Name,
Adreess,
Education (this could be a table on its own) or
Work experience (this could grow to its own table where each row represents a previous job)
So searching for a word in all those fields with WHERE rapidly becomes a very long query with several JOINS.
Instead you could change your framework of reference and think of the Whole resume as what it is a Single Document and you just want to search said document.
This is where tools like Sphinx Search do. They create a FULL TEXT index of your 'document' and then you can query sphinx and it will give you back where in the Database that record was found.
Really good search results.
Don't worry about this tools not being part of your RDBMS it will save you a lot of headaches to use the appropriate model "Documents" vs the incorrect one "TABLES" for this application.