Is htmlencoding a suitable solution to avoiding SQL injection attacks?

Is htmlencoding a suitable solution to avoiding SQL injection attacks? - sql

I've heard it claimed that the simplest solution to preventing SQL injection attacks is to html encode all text before inserting into the database. Then, obviously, decode all text when extracting it. The idea being that if the text only contains ampersands, semi-colons and alphanumerics then you can't do anything malicious.
While I see a number of cases where this may seem to work, I foresee the following problems in using this approach:
It claims to be a silver bullet. Potentially stopping users of this technique from understanding all the possible related issues - such as second-order attacks.
It doesn't necessarily prevent any second-order / delayed payload attacks.
It's using a tool for a purpose other than that which it was designed for. This may lead to confusion amongst future users/developers/maintainers of the code. It's also likley to be far from optimal in performance of effect.
It adds a potential performance hit to every read and write of the database.
It makes the data harder to read directly from the database.
It increases the size of the data on disk. (Each character now being ~5 characters - In turn this may also impact disk space requirements, data paging, size of indexes and performance of indexes and more?)
There are potential issues with high range unicode characters and combining characters?
Some html [en|de]coding routines/libraries behave slightly differently (e.g. Some encode an apostrophe and some don't. There may be more differences.) This then ties the data to the code used to read & write it. If using code which [en|de]codes differently the data may be changed/corrupted.
It potentially makes it harder to work with (or at least debug) any text which is already similarly encoded.
Is there anything I'm missing?
Is this actually a reasonable approach to the problem of preventing SQL injection attacks?
Are there any fundamental problems with trying to prevent injection attacks in this way?

You should prevent sql injection by using parameter bindings (eg. never concatenate your sql strings with user input, but use place holders for your parameters and let the framework you use do the right escaping). Html encoding, on the other hand, should be used to prevent cross-site scripting.

Absolutely not.
SQL injections should be prevented by parametrized queries. Or in the worst case by escaping the SQL parameter for SQL, not HTML. Each database has its own rules about this, mysql API (and most frameworks) for example provides a particular function for that. Data itself in the database should not be modified when stored.
Escaping HTML entities prevents XSS and other attacks when returning web content to clients' browsers.

How you get the idea that HTML Encoded text only contains ampersands, semi-colons and alphanumerics after decoding?
I can really encode a "'" in HTML - and that is one of the things needed to get yo into trouble (as it is a string delimiter in SQL).
So, it works ONLY if you put the HTML encoded text into the database.
THEN you havequite some trouble with any text search... and presentation of readable text outside (like in SQL manager). I would consider that a really bad architected sitaution as you have not solved the issue just duct-taped away an obvious attack vector.
Numeric fields are still problematic, unless your HTML handling is perfect, which I would not assume given that workaround.
Use SQL parameters ;)

The single character that enables SQL injection is the SQL string delimer ', also known as hex 27 or decimal 39.
This character is represented in the same way in SQL and in HTML. So an HTML encode does not affect SQL injection attacks at all.

Related

Will sql injection works if ' " \ are filtered

I am studing security and ethical hacking. And I have noticed that most of sql injection attacks uses " ' \ will it work if I filtered these?

Most of SQL injection examples use the ;DROP TABLE students payload which is not even a thing in many software setups. This is just an example.
You are making a very common mistake, confusing an injection (a possibility to inject an unwanted code into the SQL query) with an exploit (the actual payload to be injected with a purpose of breaking into a system).
That's two completely different matters.
So, an injection is just a possibility. And it is irrelevant to any characters. Once injection is there, then an infinite number of exploits possible, all depends on the situation. Some of them will require anything but ' and \ symbols and some will need them.
What takeaway you can make from the statements above? One should protect from injections, not exploits. Fighting characters is a losing game. Fight the possibility.
Once an application is not protected from injections, it will be hacked, with one exploit or another, using one character or another. But once it is protected, it is protected from all exploits at once, no matter which character used.

Optimal set of charecters to disallow/allow that prevent SQL injection perfectly?

I know that sql injection must rely on certain set of special characters used in SQL. So here's the question:
What would be the minimal set of characters to disallow, or as large as possible, a wet to allow, that would prevent SQL injection perfectly?
I understand This is not the best nor the easiest approach to prevent SQL injection, but i just wonder questions like "can you do it even without xxx".
Answering a comment:
Just for the purpose of curiosity, some languages other than English can indeed be written normally without some special characters. for example：
Chinese:“”‘’，。！？：；——
Japanese:””’’、。！？：；ーー
English:""'',.!?：;--

You may want to look up some examples of the kinds of ways a SQL injection attack can be leveraged against a web application with weak security architecture. As far as the "WHY" you'll need to analyze the kinds of site user hacks that can get through what's supposed to be a locked down (but public facing) system.
I turned up this reference for further analysis:
You can see more at the source of this injection:
http://www.unixwiz.net/techtips/sql-injection.html
I know that sql injection must rely on certain set of special characters used in SQL. So here's the question:
There isn't any one character that is absolutely banned, nor are there others that are trusted 100% safe.
EDIT: Response to a revised OP...
A brief history of a type of SQL Injection vulnerability: Without developing data access APIs for your database inputs, the most common approach is "dynamically" constructing SQL statements, then executing the raw SQL (a dangerous mix of developer prepared code and outside user input).
Consider in pseudocode:
#SQL_QUERY = "SELECT NAME, SOCIAL_SECURITY_NUMBER, PASSWD, BANK_ACCOUNT_NUMBER" +
"FROM SECURE_TABLE " + "WHERE NAME = " + #USER_INPUT_NAME;
$CONNECT_TO SQL_DB;
EXECUTE #SQL_QUERY;
...
The expected input when obeyed is a lookup of specific user information based on a single name value input from some user form.
While it isn't my point to accidentally teach a how-to for a new generation of would-be SQL Injection hackers, you can guess that a long concatenated string of code isn't really secure if user input on a web form can also tag on anything they like to the code; including more code of their own.
The utility or danger with using dynamic sql coding is a discussion by itself. You can put a few barriers in front of a dynamic query such as this:
An "input handler" procedure that all inputs from app forms go through before the query code module is executed:
run_query( parse_and_clean( #input_value ) )
This is where you look for things like string inputs too long for expected sizes (a 500 character name???? non-alpha numerical name values?)
Web forms can also have their own scripted input handlers. Usually intended for validating for specific inputs (not empty, longer than three characters, no punctuation, format masks like "1-XXX-XXX-XXXX", etc.) They can also scrub for input hacks such as html coding tags, javascript, urls with remote calls, etc. Sometimes this need is best served here because the offending input is caught before it gets anywhere near the database.
False positives: more often it really is just user error sending malformatted input.
These examples date to the really early days of the Internet. There are lots of source code modules/libraries, add-ons, open-source and paid variants of attempts to make SQL injection much harder to break through into your host systems. Dig around; there is even a huge thread on Stack Overflow that discusses this exact topic.
For the time I have been stumbling around and developing for the Internet, I share a common observation: "perfectly" is a description of a system that has already been hacked. You won't get there, and that isn't the point. You should look towards being the "easy, quick target" (i.e., the Benz in the gas station with windows down, keys in ignition and owner in the restroom)
When you say "can you do it even without...?" The answer is "sure" if it's a "Hello World" type project for some classroom assignment... hosted on your local machine... behind your router/firewall. After all, you gotta learn somewhere. Just be smart where you deploy stuff that isn't so secured.
It's just when you put things out open to the "world" on the Internet, you should always be thinking how the "same task" has to be done differently with a security mindset. Same outputs as your test/dev workspace but with more precautions such as locking down obvious and commonly exploited weaknesses.
Enjoy your additional research, and hopefully this post (devoid of code examples) can encourage your mindset in helpful directions.
Onward.

I know that sql injection must rely on certain set of special characters used in SQL. So here's the question:
What would be the minimal set of characters to disallow, or as large as possible, a wet to allow, that would prevent SQL injection perfectly?
This is the wrong approach to this topic. Instead you need to understand why and how SQL injections happen. Then you can work out measures which prevent them.
Because the cause of SQL injections is the incorporation of user supplied data into SQL command. If the user supplied fragments are not processed properly, they could modify the intention of the SQL command.
Here the aspect of proper processing is crucial, which again depends on the developer’s intention of how the user provided data is supposed to be interpreted. In most cases it’s a literal value (e. g., string, number, date, etc.) and sometimes a SQL keyword (e. g., sort order ASC or DESC, operators, etc.) or even an identifier (e. g., column name, etc.).
These all follow certain syntax rules which can the user provided data can either be validated against and/or enforced:
Prepared statements are the first choice for literal values. They separate the SQL statement and parameter values so that latter can’t be mistakenly interpreted as something other than literal values.
White listing can always be used when the set of valid values it limited, which is the case for SQL keywords and identifiers.
So looking at the use cases for user provided data and the corresponding options that prevent SQL injections, there is no need to restrict the set of allowed characters globally. Better stick to the methods which are best practice and have proved effective.
And if you happen to have a use case that doesn’t fit the mentioned, e. g., allowing users to provide something more complex than an atomic language elements like an arbitrary filter expression, you should break them down into their atoms to be able to ensure their integrity.

Does using non-SQL databases obviate the need for guarding against "SQL injection"?

This may seem like an obvious (or not so obvious) question, but let me explain. I'm coding up a Google App Engine site using Google's database technology, BigTable. Any App Engine coders will know that Google has its own limited query language called GQL. As a result, I am tempted not to do any checking for SQL (or GQL) injection in my app since I assume Google is not using a raw string query on its backend methods to fetch data.
Furthermore, libraries for DB technologies like CouchDB, MongoDB, and other object or document (aka NoSQL) databases seem to obviate the need to check if a malicious user is injecting database manipulation commands. They often have libraries that directly map the objects in the database to object in your language of choice. I know there are many SQL libraries that do this as well, but I assume that at some level they are combining parameters to run a query over a string, and thusly I must still use SQL Injection protection even with those frameworks.
Am I being short-sighted? Or is it only a matter of time till the next great DB system takes hold and then I will see injection into those systems?

“Injection” holes are to do with text context mismatches. Every time you put a text string into another context of string you need to do encoding to fit the changed context. It seems seductively simple to blindly stuff strings together, but the difficulty of string processing is deceptive.
Databases with a purely object-based interface are immune to injection vulnerabilities, just like parameterised queries are in SQL. There is nothing an attacker can put in his string to break out of the string literal context in which you've put him.
But GQL specifically is not one of these. It's a string query language, and if you go concatenating untrusted unescaped material into a query like "WHERE title='%s'" % title, you're just as vulnerable as you were with full-on SQL. Maybe the limited capabilities of GQL make it more difficult to exploit that to completely compromise the application, but certainly not impossible in general, and in the very best case your application is still wrong and will fall over when people try to legitimately use apostrophes.
GQL has a parameter binding interface. Use it. Resist the allure of string hacking.

SQL-subsets like GQL obviously still concern themselves with it -- but pure non-SQL databases like CouchDB, Voldemort, etc should put & get data without concern for SQL-injection-style attacks.
That however does not excuse you from doing content validation, because while it might not break the database, it may break your application and allow things like XSS (if it is a web app).

Anytime data that is from or manipulated by user input is used to control the execution of code, there needs to be sanitization. I've seen cases where code used user input to execute a command without sanitizing the input. It hadn't been exploited, but if it had been it would have been a horrible attack vector.

SQl Injection is only a subset of a type of security flaw in which any uncontrolled input gets evaluated.
techincally, you could "inject" javascript, among others.

What's the canonical way to store arbitrary (possibly marked up) text in SQL?

What do wikis/stackoverflow/etc. do when it comes to storing text? Is the text broken at newlines? Is it broken into fixed-length chunks? How do you best store arbitrarily long chunks of text?

nvarchar(max) ftw. because over complicating simple things is bad, mmkay?

I guess if you need to offer the ability to store large chunks of text and you don't mind not being able to look into their content too much when querying, you can use CLobs.

This all depends on the RDBMS that you are using as well as the types of text that you are going to store. If the text is formatted into sizable chunks of data that mean something in and of themselves, like, say header/body, then you might want to break the data up into columns of these types. It may take multiple tables to use this method depending on the content that you are dealing with.
I don't know how other RDBMS's handle it, but I know that that it's not a good idea to have more than one open ended column in each table (text or varchar(max)). So you will want to make sure that only one column has unlimited characters.

Regarding PostgreSQL - use type TEXT or BYTEA. If you need to read random chunks you may consider large objects.

If you need to worry about keeping things like formatting strings, quotes, and other "cruft" in the text, as code would likely have, then the special characters need to be completely escaped first - otherwise on submission the db, they might end up causing an invalid command to be issued.
Most scripting languages have tools to do this built-in natively.

I guess it depends on where you want to store the text, if you need things like transactions etc.
Databases like SQL Server have a type that can store long text fields. In SQL Server 2005 this would primarily be nvarchar(max) for long unicode text strings. By using a database you can benefit from transactions and easy backup/restore assuming you are using the database for other things like StackOverflow.com does.
The alternative is to store text in files on disk. This may be fairly simple to implement and can work in environments where a database is not available or overkill.
Regards the format of the text that is stored in a database or file, it is probably very close to the input. If it's HTML then you would just push it through a function that would correctly escape it.
Something to remember is that you probably want to be using unicode or UTF-8 from creation to storage and vice-versa. This will allow you to support additional languages. Any problem with this encoding mechanism will corrupt your text. Historically people may have defaulted to ASCII based on the assumption they were saving disk space etc.

For SQL Server:
Use a varchar(max) to store. I think the upper limit is 2 GB.
Don't try to escape the text yourself. Pass the text through a parameterizing structure that will do the escapes properly for you. In .Net you'd add a parameter to a SqlCommand, or just use LinqToSQL (which then manages the SqlCommand for you).

I suspect StackOverflow is storing text in markdown format in arbitrarily-sized 'text' column. Maybe as UTF8 (but it might be UTF16 or something. I'm guessing it's SQL Server, which I don't know much about).
As a general rule you want to store stuff in your database in the 'rawest' form possible. That is, do all your decoding, and possibly cleaning, but don't do anything else with it (for example, if it's Markdown, don't encode it to HTML, leave it in its original 'raw' format)

Catching SQL Injection and other Malicious Web Requests

I am looking for a tool that can detect malicious requests (such as obvious SQL injection gets or posts) and will immediately ban the IP address of the requester/add to a blacklist. I am aware that in an ideal world our code should be able to handle such requests and treat them accordingly, but there is a lot of value in such a tool even when the site is safe from these kinds of attacks, as it can lead to saving bandwidth, preventing bloat of analytics, etc.
Ideally, I'm looking for a cross-platform (LAMP/.NET) solution that sits at a higher level than the technology stack; perhaps at the web-server or hardware level. I'm not sure if this exists, though.
Either way, I'd like to hear the community's feedback so that I can see what my options might be with regard to implementation and approach.

Your almost looking at it the wrong way, no 3party tool that is not aware of your application methods/naming/data/domain is going to going to be able to perfectly protect you.
Something like SQL injection prevention is something that has to be in the code, and best written by the people that wrote the SQL, because they are the ones that will know what should/shouldnt be in those fields (unless your project has very good docs)
Your right, this all has been done before. You dont quite have to reinvent the wheel, but you do have to carve a new one because of a differences in everyone's axle diameters.
This is not a drop-in and run problem, you really do have to be familiar with what exactly SQL injection is before you can prevent it. It is a sneaky problem, so it takes equally sneaky protections.
These 2 links taught me far more then the basics on the subject to get started, and helped me better phrase my future lookups on specific questions that weren't answered.
SQL injection
SQL Injection Attacks by Example
And while this one isnt quite a 100% finder, it will "show you the light" on existing problem in your existing code, but like with webstandards, dont stop coding once you pass this test.
Exploit-Me

The problem with a generic tool is that it is very difficult to come up with a set of rules that will only match against a genuine attack.
SQL keywords are all English words, and don't forget that the string
DROP TABLE users;
is perfectly valid in a form field that, for example, contains an answer to a programming question.
The only sensible option is to sanitise the input before ever passing it to your database but pass it on nonetheless. Otherwise lots of perfectly normal, non-malicious users are going to get banned from your site.

One method that might work for some cases would be to take the sql string that would run if you naively used the form data and pass it to some code that counts the number of statements that would actually be executed. If it is greater than the number expected, then there is a decent chance that an injection was attempted, especially for fields that are unlikely to include control characters such as username.
Something like a normal text box would be a bit harder since this method would be a lot more likely to return false positives, but this would be a start, at least.

One little thing to keep in mind: In some countries (i.e. most of Europe), people do not have static IP Addresses, so blacklisting should not be forever.

Oracle has got an online tutorial about SQL Injection. Even though you want a ready-made solution, this might give you some hints on how to use it better to defend yourself.

Now that I think about it, a Bayesian filter similar to the ones used to block spam might work decently too. If you got together a set of normal text for each field and a set of sql injections, you might be able to train it to flag injection attacks.

One of my sites was recently hacked through SQL Injection. It added a link to a virus for every text field in the db! The fix was to add some code looking for SQL keywords. Fortunately, I've developed in ColdFiusion, so the code sits in my Application.cfm file which is run at the beginning of every webpage & it looks at all the URL variables. Wikipedia has some good links to help too.

Interesting how this is being implemented years later by google and them removing the URL all together in order to prevent XSS attacks and other malicious acitivites

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas