Optimal set of charecters to disallow/allow that prevent SQL injection perfectly?

Optimal set of charecters to disallow/allow that prevent SQL injection perfectly? - sql

I know that sql injection must rely on certain set of special characters used in SQL. So here's the question:
What would be the minimal set of characters to disallow, or as large as possible, a wet to allow, that would prevent SQL injection perfectly?
I understand This is not the best nor the easiest approach to prevent SQL injection, but i just wonder questions like "can you do it even without xxx".
Answering a comment:
Just for the purpose of curiosity, some languages other than English can indeed be written normally without some special characters. for example：
Chinese:“”‘’，。！？：；——
Japanese:””’’、。！？：；ーー
English:""'',.!?：;--

You may want to look up some examples of the kinds of ways a SQL injection attack can be leveraged against a web application with weak security architecture. As far as the "WHY" you'll need to analyze the kinds of site user hacks that can get through what's supposed to be a locked down (but public facing) system.
I turned up this reference for further analysis:
You can see more at the source of this injection:
http://www.unixwiz.net/techtips/sql-injection.html
I know that sql injection must rely on certain set of special characters used in SQL. So here's the question:
There isn't any one character that is absolutely banned, nor are there others that are trusted 100% safe.
EDIT: Response to a revised OP...
A brief history of a type of SQL Injection vulnerability: Without developing data access APIs for your database inputs, the most common approach is "dynamically" constructing SQL statements, then executing the raw SQL (a dangerous mix of developer prepared code and outside user input).
Consider in pseudocode:
#SQL_QUERY = "SELECT NAME, SOCIAL_SECURITY_NUMBER, PASSWD, BANK_ACCOUNT_NUMBER" +
"FROM SECURE_TABLE " + "WHERE NAME = " + #USER_INPUT_NAME;
$CONNECT_TO SQL_DB;
EXECUTE #SQL_QUERY;
...
The expected input when obeyed is a lookup of specific user information based on a single name value input from some user form.
While it isn't my point to accidentally teach a how-to for a new generation of would-be SQL Injection hackers, you can guess that a long concatenated string of code isn't really secure if user input on a web form can also tag on anything they like to the code; including more code of their own.
The utility or danger with using dynamic sql coding is a discussion by itself. You can put a few barriers in front of a dynamic query such as this:
An "input handler" procedure that all inputs from app forms go through before the query code module is executed:
run_query( parse_and_clean( #input_value ) )
This is where you look for things like string inputs too long for expected sizes (a 500 character name???? non-alpha numerical name values?)
Web forms can also have their own scripted input handlers. Usually intended for validating for specific inputs (not empty, longer than three characters, no punctuation, format masks like "1-XXX-XXX-XXXX", etc.) They can also scrub for input hacks such as html coding tags, javascript, urls with remote calls, etc. Sometimes this need is best served here because the offending input is caught before it gets anywhere near the database.
False positives: more often it really is just user error sending malformatted input.
These examples date to the really early days of the Internet. There are lots of source code modules/libraries, add-ons, open-source and paid variants of attempts to make SQL injection much harder to break through into your host systems. Dig around; there is even a huge thread on Stack Overflow that discusses this exact topic.
For the time I have been stumbling around and developing for the Internet, I share a common observation: "perfectly" is a description of a system that has already been hacked. You won't get there, and that isn't the point. You should look towards being the "easy, quick target" (i.e., the Benz in the gas station with windows down, keys in ignition and owner in the restroom)
When you say "can you do it even without...?" The answer is "sure" if it's a "Hello World" type project for some classroom assignment... hosted on your local machine... behind your router/firewall. After all, you gotta learn somewhere. Just be smart where you deploy stuff that isn't so secured.
It's just when you put things out open to the "world" on the Internet, you should always be thinking how the "same task" has to be done differently with a security mindset. Same outputs as your test/dev workspace but with more precautions such as locking down obvious and commonly exploited weaknesses.
Enjoy your additional research, and hopefully this post (devoid of code examples) can encourage your mindset in helpful directions.
Onward.

I know that sql injection must rely on certain set of special characters used in SQL. So here's the question:
What would be the minimal set of characters to disallow, or as large as possible, a wet to allow, that would prevent SQL injection perfectly?
This is the wrong approach to this topic. Instead you need to understand why and how SQL injections happen. Then you can work out measures which prevent them.
Because the cause of SQL injections is the incorporation of user supplied data into SQL command. If the user supplied fragments are not processed properly, they could modify the intention of the SQL command.
Here the aspect of proper processing is crucial, which again depends on the developer’s intention of how the user provided data is supposed to be interpreted. In most cases it’s a literal value (e. g., string, number, date, etc.) and sometimes a SQL keyword (e. g., sort order ASC or DESC, operators, etc.) or even an identifier (e. g., column name, etc.).
These all follow certain syntax rules which can the user provided data can either be validated against and/or enforced:
Prepared statements are the first choice for literal values. They separate the SQL statement and parameter values so that latter can’t be mistakenly interpreted as something other than literal values.
White listing can always be used when the set of valid values it limited, which is the case for SQL keywords and identifiers.
So looking at the use cases for user provided data and the corresponding options that prevent SQL injections, there is no need to restrict the set of allowed characters globally. Better stick to the methods which are best practice and have proved effective.
And if you happen to have a use case that doesn’t fit the mentioned, e. g., allowing users to provide something more complex than an atomic language elements like an arbitrary filter expression, you should break them down into their atoms to be able to ensure their integrity.

Related

Is this a good approach for avoiding SQL injection?

Here in the company I work, we have a support tool that, among other things, provides a page that allows the user to run SELECT queries. It should prevent the user from running UPDATE, INSERT, DELETE, DROP, etc. Besides that, every select statement is accepted.
The way it works is by executing
SELECT * FROM (<query>)
so any statement besides a SELECT should fail due to a syntax error.
In my opinion, this approach is not enough to prevent an attack since anything could change the out-query and break the security. I affirmed that along with that solution it should also check the syntax of the inside query. My colleagues asked me to prove that the current solution is unsafe.
To test it, I tried to write something like
SELECT * from dual); DROP table users --
But it failed because of the ; character that is not accepted by the SQL connector.
So, is there any way to append a modification statement in a query like that?
By the way, it is Oracle SQL.
EDIT:
Just to put it more clear: I know this is not a good approach. But I must prove it to my colleagues to justify a code modification. Theoretical answers are good, but I think a real injection would be more efficient.

The protection is based on the idea/assumption that "update queries" are never going to produce a result table (which is what it would take to make it a valid sub-expression to your SELECT FROM (...) ).
Proprietary engines with proprietary language extensions might undermine that assumption. And although admittedly it still seems unlikely, in the world of proprietary extensions there really is some crazy stuff flying around so don't assume too lightly.
Maybe also beware of expression compilers that coerce "does not return a table" into "an empty table of some kind". You know. Because any system must do anything it can to make the user action succeed instead of fail/crash/...
And maybe also consider that if "query whatever you like" is really the function that is needed, then your DBMS most likely already has some tool or component that actually allows that ... (and is even designed specifically for the purpose).

I'm going to assume that it's deemed acceptable for users to see any data accessible from that account (as that is what this seems designed to do).
It's also fairly trivial to perform a Denial of Service with this, either with an inefficient query, or with select for update, which could be used to lock critical tables.
Oracle is a feature rich DB, and that means there is likely a variety of ways to run DML from within a query. You would need to find an inline PL/SQL function that allow you to perform DML or have other side effects. It will depend on the specific schema as to what packages are available - the XML DB packages have some mechanisms to run arbitrary SQL, the UTL_HTTP packages can often be used to launch network attacks, and the java functionality is quite powerful.
The correct way to protect against this is to use the DB security mechanisms - run this against a read-only schema (one with query privs only on the tables).

Why is SQL text-based?

I suppose this isn't a particularly "good fit for our Q&A format" but no idea where else to ask it.
Why is SQL-querying text based? Is there a historical motivation, or just laziness?
mysqli_query("SELECT * FROM users WHERE Name=$name");
It opens up an incredible number of stupid mistakes like above, encouraging SQL injection. It's basically exec for SQL, and exec'ing code, especially dynamically generated, is widely discouraged.
Plus, to be honest, prepared statements seem like just a verbose workaround.
Why don't we have a more object-oriented, noninjectable way of doing it, like having a directly-accessible database object with certain methods:
$DB->setDB("mysite");
$table='users';
$col='username';
$DB->selectWhereEquals($table,$col,$name);
Which the database itself natively implements, completely eliminating all resemblance to exec and nullifying the entire basis of SQL injection.
Culminating in the real question, is there any database framework that does anything like this?

Why are programming languages text-based? Quite simply: because text can be used to represent powerful human-readable/editable DSLs (Domain-specific language), such as SQL.
The better (?) question is: Why do programmers refuse to use placeholders?
Modern database providers (e.g. ADO.NET, PDO) support placeholders and an appropriate range of database adapters1. Programmers who fail to take advantage of this support only have themselves to blame.
Besides ubiquitous support for placeholders in direct database providers, there are also many different APIs available including:
"Fluent database libraries"; these generally map an AST, such as LINQ, onto a dynamically generated SQL query and take care of the details such as correctly using placeholders.
A wide variety of ORMs; may or may not include a fluent API.
Primitive CRUD wrappers as found in the Android SQLite API, which looks suspiciously similar to the proposal.
I love the power of SQL and almost none of the queries I write can be expressed in such a trivial API. Even LINQ, which is a powerful provider, can sometimes get in my way (although the prudent use of Views helps).
1 Even though SQL is text-based (and such is used to pass the shape of the query to the back-end), bound data need not be included in-line, which is where the issue of SQL Injection (being able to change the shape of the query) comes in. The database adapter can, and does if it is smart enough, use the API exposed by the target database. In the case of SQL Server this amounts to sending the data separately [via RPC] and for SQLite this means creating and binding separate prepared-statement objects.

Is htmlencoding a suitable solution to avoiding SQL injection attacks?

I've heard it claimed that the simplest solution to preventing SQL injection attacks is to html encode all text before inserting into the database. Then, obviously, decode all text when extracting it. The idea being that if the text only contains ampersands, semi-colons and alphanumerics then you can't do anything malicious.
While I see a number of cases where this may seem to work, I foresee the following problems in using this approach:
It claims to be a silver bullet. Potentially stopping users of this technique from understanding all the possible related issues - such as second-order attacks.
It doesn't necessarily prevent any second-order / delayed payload attacks.
It's using a tool for a purpose other than that which it was designed for. This may lead to confusion amongst future users/developers/maintainers of the code. It's also likley to be far from optimal in performance of effect.
It adds a potential performance hit to every read and write of the database.
It makes the data harder to read directly from the database.
It increases the size of the data on disk. (Each character now being ~5 characters - In turn this may also impact disk space requirements, data paging, size of indexes and performance of indexes and more?)
There are potential issues with high range unicode characters and combining characters?
Some html [en|de]coding routines/libraries behave slightly differently (e.g. Some encode an apostrophe and some don't. There may be more differences.) This then ties the data to the code used to read & write it. If using code which [en|de]codes differently the data may be changed/corrupted.
It potentially makes it harder to work with (or at least debug) any text which is already similarly encoded.
Is there anything I'm missing?
Is this actually a reasonable approach to the problem of preventing SQL injection attacks?
Are there any fundamental problems with trying to prevent injection attacks in this way?

You should prevent sql injection by using parameter bindings (eg. never concatenate your sql strings with user input, but use place holders for your parameters and let the framework you use do the right escaping). Html encoding, on the other hand, should be used to prevent cross-site scripting.

Absolutely not.
SQL injections should be prevented by parametrized queries. Or in the worst case by escaping the SQL parameter for SQL, not HTML. Each database has its own rules about this, mysql API (and most frameworks) for example provides a particular function for that. Data itself in the database should not be modified when stored.
Escaping HTML entities prevents XSS and other attacks when returning web content to clients' browsers.

How you get the idea that HTML Encoded text only contains ampersands, semi-colons and alphanumerics after decoding?
I can really encode a "'" in HTML - and that is one of the things needed to get yo into trouble (as it is a string delimiter in SQL).
So, it works ONLY if you put the HTML encoded text into the database.
THEN you havequite some trouble with any text search... and presentation of readable text outside (like in SQL manager). I would consider that a really bad architected sitaution as you have not solved the issue just duct-taped away an obvious attack vector.
Numeric fields are still problematic, unless your HTML handling is perfect, which I would not assume given that workaround.
Use SQL parameters ;)

The single character that enables SQL injection is the SQL string delimer ', also known as hex 27 or decimal 39.
This character is represented in the same way in SQL and in HTML. So an HTML encode does not affect SQL injection attacks at all.

Does using non-SQL databases obviate the need for guarding against "SQL injection"?

This may seem like an obvious (or not so obvious) question, but let me explain. I'm coding up a Google App Engine site using Google's database technology, BigTable. Any App Engine coders will know that Google has its own limited query language called GQL. As a result, I am tempted not to do any checking for SQL (or GQL) injection in my app since I assume Google is not using a raw string query on its backend methods to fetch data.
Furthermore, libraries for DB technologies like CouchDB, MongoDB, and other object or document (aka NoSQL) databases seem to obviate the need to check if a malicious user is injecting database manipulation commands. They often have libraries that directly map the objects in the database to object in your language of choice. I know there are many SQL libraries that do this as well, but I assume that at some level they are combining parameters to run a query over a string, and thusly I must still use SQL Injection protection even with those frameworks.
Am I being short-sighted? Or is it only a matter of time till the next great DB system takes hold and then I will see injection into those systems?

“Injection” holes are to do with text context mismatches. Every time you put a text string into another context of string you need to do encoding to fit the changed context. It seems seductively simple to blindly stuff strings together, but the difficulty of string processing is deceptive.
Databases with a purely object-based interface are immune to injection vulnerabilities, just like parameterised queries are in SQL. There is nothing an attacker can put in his string to break out of the string literal context in which you've put him.
But GQL specifically is not one of these. It's a string query language, and if you go concatenating untrusted unescaped material into a query like "WHERE title='%s'" % title, you're just as vulnerable as you were with full-on SQL. Maybe the limited capabilities of GQL make it more difficult to exploit that to completely compromise the application, but certainly not impossible in general, and in the very best case your application is still wrong and will fall over when people try to legitimately use apostrophes.
GQL has a parameter binding interface. Use it. Resist the allure of string hacking.

SQL-subsets like GQL obviously still concern themselves with it -- but pure non-SQL databases like CouchDB, Voldemort, etc should put & get data without concern for SQL-injection-style attacks.
That however does not excuse you from doing content validation, because while it might not break the database, it may break your application and allow things like XSS (if it is a web app).

Anytime data that is from or manipulated by user input is used to control the execution of code, there needs to be sanitization. I've seen cases where code used user input to execute a command without sanitizing the input. It hadn't been exploited, but if it had been it would have been a horrible attack vector.

SQl Injection is only a subset of a type of security flaw in which any uncontrolled input gets evaluated.
techincally, you could "inject" javascript, among others.

Catching SQL Injection and other Malicious Web Requests

I am looking for a tool that can detect malicious requests (such as obvious SQL injection gets or posts) and will immediately ban the IP address of the requester/add to a blacklist. I am aware that in an ideal world our code should be able to handle such requests and treat them accordingly, but there is a lot of value in such a tool even when the site is safe from these kinds of attacks, as it can lead to saving bandwidth, preventing bloat of analytics, etc.
Ideally, I'm looking for a cross-platform (LAMP/.NET) solution that sits at a higher level than the technology stack; perhaps at the web-server or hardware level. I'm not sure if this exists, though.
Either way, I'd like to hear the community's feedback so that I can see what my options might be with regard to implementation and approach.

Your almost looking at it the wrong way, no 3party tool that is not aware of your application methods/naming/data/domain is going to going to be able to perfectly protect you.
Something like SQL injection prevention is something that has to be in the code, and best written by the people that wrote the SQL, because they are the ones that will know what should/shouldnt be in those fields (unless your project has very good docs)
Your right, this all has been done before. You dont quite have to reinvent the wheel, but you do have to carve a new one because of a differences in everyone's axle diameters.
This is not a drop-in and run problem, you really do have to be familiar with what exactly SQL injection is before you can prevent it. It is a sneaky problem, so it takes equally sneaky protections.
These 2 links taught me far more then the basics on the subject to get started, and helped me better phrase my future lookups on specific questions that weren't answered.
SQL injection
SQL Injection Attacks by Example
And while this one isnt quite a 100% finder, it will "show you the light" on existing problem in your existing code, but like with webstandards, dont stop coding once you pass this test.
Exploit-Me

The problem with a generic tool is that it is very difficult to come up with a set of rules that will only match against a genuine attack.
SQL keywords are all English words, and don't forget that the string
DROP TABLE users;
is perfectly valid in a form field that, for example, contains an answer to a programming question.
The only sensible option is to sanitise the input before ever passing it to your database but pass it on nonetheless. Otherwise lots of perfectly normal, non-malicious users are going to get banned from your site.

One method that might work for some cases would be to take the sql string that would run if you naively used the form data and pass it to some code that counts the number of statements that would actually be executed. If it is greater than the number expected, then there is a decent chance that an injection was attempted, especially for fields that are unlikely to include control characters such as username.
Something like a normal text box would be a bit harder since this method would be a lot more likely to return false positives, but this would be a start, at least.

One little thing to keep in mind: In some countries (i.e. most of Europe), people do not have static IP Addresses, so blacklisting should not be forever.

Oracle has got an online tutorial about SQL Injection. Even though you want a ready-made solution, this might give you some hints on how to use it better to defend yourself.

Now that I think about it, a Bayesian filter similar to the ones used to block spam might work decently too. If you got together a set of normal text for each field and a set of sql injections, you might be able to train it to flag injection attacks.

One of my sites was recently hacked through SQL Injection. It added a link to a virus for every text field in the db! The fix was to add some code looking for SQL keywords. Fortunately, I've developed in ColdFiusion, so the code sits in my Application.cfm file which is run at the beginning of every webpage & it looks at all the URL variables. Wikipedia has some good links to help too.

Interesting how this is being implemented years later by google and them removing the URL all together in order to prevent XSS attacks and other malicious acitivites

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas