Is SQL Injection/XSS attack possible with preg_replace? - sql

I have done some research on how injection/XSS attacks work. it seems like hackers simply make use of the USER INPUT fields to input codes.
However, suppose I restrict every USER INPUT fields with only alphanumerics(a-zA-Z0-9) with preg_replace, and lets assume that I use the soon-to-be-deprecated my_sql instead of PDO or my_sqli.
Would hackers still be able to inject/hack my website?
Thanks!

Short version: Don't do it.
Long version:
Suppose you have
SELECT * FROM my_table WHERE id = $user_input
If this happens, then some inputs (such as CURRENT_TIMESTAMP) are still possible, though the "attack" would be limited to the point of probably being harmless. The solution here could be to restrict the input to [0-9].
In Strings ("$user_input"), the problem shouldn't even exist.
However:
You have to make sure you implement your escape function correctly.
It is incredibly annoying for the end user. For instance, if this was a text field, why aren't white spaces allowed? What about รก? What if I want to quote someone with ""? Write a math expression with < (or even write something apparently harmless such as i <3 u)?
So now you have:
A homebrew solution, which has to be checked for correctness (and may have bugs, as any other function). Bugs in this function are potential security issues;
A solution which is unfamiliar to other programmers, who have to get used to it. Code without the usual escape functions is usually wrong code, so it's masssively surprising;
A solution that's fragile. What if someone else modifies your code and forgets to add the validation? What if you forget the validation?
You are focusing on solving a problem that's already been solved. Why waste time doing something that takes time to develop and is hard to maintain when others have already developed proper solutions that take close to no effort to use.
Finally, don't use deprecated APIs. Things are deprecated for a reason. Deprecated can mean stuff like "we'll drop support at any minute" or "this is has severe issues but we can't fix it for some reason".
Deprecated APIs are supposed to be used by legacy applications of developers that did not have enough time or resources to migrate. When starting from scratch, use the supported APIs.

Related

SQL Injection attempt, how does it work

I was looking at the logs and found a sql injection. Looks like it is used alot but I don't really understand how it works. I attempted to submit it through the form they submitted it through but nothing happens.
The injection string is:
(select(0)from(select(sleep(0)))v)/*''+(select(0)from(select(sleep(0)))v)+''"+(select(0)from(select(sleep(0)))v)+"*/
Can't figure out how they injected it. Didn't affect the server from what I can tell. They didn't get any data. But I still want to know how they made it work.
This is a vulnerability check. It's one of the easiest and safest way to figure out if your server is vulnerable to SQL injection - and more importantly, it doesn't need any attention from the would-be attacker! You can use a method like this to test sites automatically for SQL injection vulnerabilities - and in this case, it means that the potential attacker can run any kind of query or command, you seem to have no checks whatsoever. Needless to say, this is bad.
You should consider your server compromised - it's probably on someone's list now, pending further exploitation. Fix the issue ASAP, and ideally prevent the functionality altogether right away if the real fix is going to take some time.
The idea behind this is that a vulnerable server will respond differently to a query with different values for the sleep argument - this means that it's very easy to automatically go through all possible inputs (don't forget that even things like hidden fields and dropdowns can be changed at will) and find out if any of those are vulnerable. When this works, you can either inject a malicious query/command outright, or keep using the sleep to figure out information directly - particularly useful when there's no data you could make appear to the outside by modifying the vulnerable query. By series of yes-no questions (based on simple if(whatever, sleep(5), 0)) you can determine enough to press your attack further.

Good Use Cases of Comments

I've always hated comments that fill half the screen with asterisks just to tell you that the function returns a string, I never read those comments.
However, I do read comments that describe why something is done and how it's done (usually the single line comments in the code); those come in really handy when trying to understand someone else's code.
But when it comes to writing comments, I don't write that, rather, I use comments only when writing algorithms in programming contests, I'd think of how the algorithm will do what it does then I'd write each one in a comment, then write the code that corresponds to that comment.
An example would be:
//loop though all the names from n to j - 1
Other than that I can't imagine why anyone would waste valuable time writing comments when he could be writing code.
Am I right or wrong? Am I missing something? What other good use cases of comments am I not aware of?
Comments should express why you are doing something not what you are doing
It's an old adage, but a good metric to use is:
Comment why you're doing something, not how you're doing it.
Saying "loop through all the names from n to j-1" should be immediately clear to even a novice programmer from the code alone. Giving the reason why you're doing that can help with readability.
If you use something like Doxygen, you can fully document your return types, arguments, etc. and generate a nice "source code manual." I often do this for clients so that the team that inherits my code isn't entirely lost (or forced to review every header).
Documentation blocks are often overdone, especially is strongly typed languages. It makes a lot more sense to be verbose with something like Python or PHP than C++ or Java. That said, it's still nice to do for methods & members that aren't self explanatory (not named update, for instance).
I've been saved many hours of thinking, simply by commenting what I'd want to tell myself if I were reading my code for the first time. More narrative and less observation. Comments should not only help others, but yourself as well... especially if you haven't touched it in five years. I have some ten year old Perl that I wrote and I still don't know what it does anymore.
Something very dirty, that I've done in PHP & Python, is use reflection to retrieve comment blocks and label elements in the user interface. It's a use case, albeit nasty.
If using a bug tracker, I'll also drop the bug ID near my changes, so that I have a reference back to the tracker. This is in addition to a brief description of the change (inline change logs).
I also violate the "only comment why not what" rule when I'm doing something that my colleagues rarely see... or when subtlety is important. For instance:
for (int i = 50; i--; ) cout << i; // looping from 49..0 in reverse
for (int i = 50; --i; ) cout << i; // looping from 49..1 in reverse
I use comments in the following situations:
High-level API documentation comments, i.e. what is this class or function for?
Commenting the "why".
A short, high-level summary of what a much longer block of code does. The key word here is summary. If someone wants more detail, the code should be clear enough that they can get it from the code. The point here is to make it easy for someone browsing the code to figure out where some piece of logic is without having to wade through the details of how it's performed. Ideally these cases should be factored out into separate functions instead, but sometimes it's just not do-able because the function would have 15 parameters and/or not be nameable.
Pointing out subtleties that are visible from reading the code if you're really paying attention, but don't stand out as much as they should given their importance.
When I have a good reason why I need to do something in a hackish way (performance, etc.) and can't write the code more clearly instead of using a comment.
Comment everything that you think is not straightforward and you won't be able to understand the next time you see your code.
It's not a bad idea to record what you think your code should be achieving (especially if the code is non-intuitive, if you want to keep comments down to a minimum) so that someone reading it a later date, has an easier time when debugging/bugfixing. Although one of the most frustrating things to encounter in reading someone else's code is cases where the code has been updated, but not the comments....
I've always hated comments that fill half the screen with asterisks just to tell you that the function returns a string, I never read those comments.
Some comments in that vein, not usually with formatting that extreme, actually exist to help tools like JavaDoc and Doxygen generate documentation for your code. This, I think, is a good form of comment, because it has both a human- and machine-readable format for documentation (so the machine can translate it to other, more useful formats like HTML), puts the documentation close to the code that it documents (so that if the code changes, the documentation is more likely to be updated to reflect these changes), and generally gives a good (and immediate) explanation to someone new to a large codebase of why a particular function exists.
Otherwise, I agree with everything else that's been stated. Comment why, and only comment when it's not obvious. Other than Doxygen comments, my code generally has very few comments.
Another type of comment that is generally useless is:
// Commented out by Lumpy Cheetosian on 1/17/2009
...uh, OK, the source control system would have told me that. What it won't tell me is WHY Lumpy commented out this seemingly necessary piece of code. Since Lumpy is located in Elbonia, I won't be able to find out until Monday when they all return from the Snerkrumph holiday festival.
Consider your audience, and keep the noise level down. If your comments include too much irrelevant crap, developers will just ignore them in practice.
BTW: Javadoc (or Doxygen, or equiv.) is a Good Thing(tm), IMHO.
I also use comments to document where a specific requirement came from. That way the developer later can look at the requirement that caused the code to be like it was and, if the new requirement conflicts with the other requirment get that resolved before breaking an existing process. Where I work requirments can often come from different groups of people who may not be aware of other requirements then system must meet. We also get frequently asked why we are doing a certain thing a certain way for a particular client and it helps to be able to research to know what requests in our tracking system caused the code to be the way it is. This can also be done on saving the code in the source contol system, but I consider those notes to be comments as well.
Reminds me of
Real programmers don't write documentation
I wrote this comment a while ago, and it's saved me hours since:
// NOTE: the close-bracket above is NOT the class Items.
// There are multiple classes in this file.
// I've already wasted lots of time wondering,
// "why does this new method I added at the end of the class not exist?".

When to join name and when not to?

Most languages give guidelines to separate different words of a name by underscores (python, C etc.) or by camel-casing (Java). However the problem is when to consider the names as separate. The options are:
1) Do it at every instance when separate words from the English dictionary occur e.g. create_gui(), recv_msg(), createGui(), recvMsg() etc.
2) Use some intuition to decide when to do this and when not to do this e.g. recvmsg() is OK, but its better to have create_gui() .
What is this intuition?
The question looks trivial. But it presents a problem which is common and takes at least 5 seconds for each instance whenever it appears.
I always do your option 1, and as far as I can tell, all modern frameworks do.
One thing that comes to mind that just sticks names together is the standard C library. But its function names are often pretty cryptic anyway.
I'm probably biased as an Objective-C programmer, where things tend to be quite spelled out, but I'd never have a method like recvMsg. It would be receiveMessage (and the first parameter should be of type Message; if it's a string, then it should be receiveString or possibly receiveMessageString depending on context). When you spell things out this way, I think the question tends to go away. You would never say receivemessage.
The only time I abbreviate is when the abbreviation is more clear than the full version. createGUI is good because "GUI" (gooey) is the common way we say it in English. createGraphicalUserInterface is actually more confusing, so should be avoided.
So to the original question, I believe #1 is best, but coupled with an opposition to unclear abbreviations.
One of the most foolish naming choices ever made in Unix was creat(), making a nonsense word to save one keystroke. Code is written once and read many times, so it should be biased towards ease of reading rather than writing.
For me, and this is just me, I prefer to follow whatever is conventional for the language, thus camelCase for Java and C++, underscore for C and SQL.
But whatever you do, be consistent within any source file or project. The reader of your code will thank you; seeing an identifier that is inconsistent with most others makes the reader pause and ask "is something different going on with this identifier? Is there something here I should be noticing?"
Or in other words, follow the Principal of Least Surprise.
Edit: This got downmodded why??
Just follow coding style, such moments usually well described.
For example:
ClassNamesInCamelNotaionWithFirstLetterCapitalized
classMethod()
classMember
CONSTANTS_IN_UPPERCASE_WITH_UNDERSCORE
local_variables_in_lowercase_with_underscores

Languages other than SQL in postgres

I've been using PostgreSQL a little bit lately, and one of the things that I think is cool is that you can use languages other than SQL for scripting functions and whatnot. But when is this actually useful?
For example, the documentation says that the main use for PL/Perl is that it's pretty good at text manipulation. But isn't that more of something that should be programmed into the application?
Secondly, is there any valid reason to use an untrusted language? It seems like making it so that any user can execute any operation would be a bad idea on a production system.
PS. Bonus points if someone can make PL/LOLCODE seem useful.
#Mike: this kind of thinking makes me nervous. I've heard to many times "this should be infinitely portable", but when the question is asked: do you actually foresee that there will be any porting? the answer is: no.
Sticking to the lowest common denominator can really hurt performance, as can the introduction of abstraction layers (ORM's, PHP PDO, etc). My opinion is:
Evaluate realistically if there is a need to support multiple RDBMS's. For example if you are writing an open source web application, chances are that you need to support MySQL and PostgreSQL at least (if not MSSQL and Oracle)
After the evaluation, make the most of the platform you decided upon
And BTW: you are mixing relational with non-relation databases (CouchDB is not a RDBMS comparable with Oracle for example), further exemplifying the point that the perceived need for portability is many times greatly overestimated.
"isn't that [text manipulation] more of something that should be programmed into the application?"
Usually, yes. The generally accepted "three-tier" application design for databases says that your logic should be in the middle tier, between the client and the database. However, sometimes you need some logic in a trigger or need to index on a function, requiring that some code be placed into the database. In that case all the usual "which language should I use?" questions come up.
If you only need a little logic, the most-portable language should probably be used (pl/pgSQL). If you need to do some serious programming though, you might be better off using a more expressive language (maybe pl/ruby). This will always be a judgment call.
"is there any valid reason to use an untrusted language?"
As above, yes. Again, putting direct file access (for example) into your middle tier is best when possible, but if you need to fire things off based on triggers (that might need access to data not available directly to your middle tier), then you need untrusted languages. It's not ideal, and should generally be avoided. And you definitely need to guard access to it.
These days, any "unique" or "cool" feature in a DBMS makes me incredibly nervous. I break out in a rash and have to stop work until the itching goes away.
I just hate to be locked in to a platform unnecessarily. Suppose you build a big chunk of your system in PL/Perl inside the database. Or in C# within SQL Server, or PL/SQL within Oracle, there are plenty of examples*.
Now you suddenly discover that your chosen platform doesn't scale. Or isn't fast enough. Or something. Worse, there's a new kid on the database block (something like MonetDB, CouchDB, Cache, say but much cooler) that would solve all your problems (even if your only problem, like mine, is having an uncool databse platform). And you can't switch to it without recoding half your application.
(*Admittedly, the paid-for products are to some extent seeking to lock you in by persuading you to use their unique features, which is not an accusation that can directly be levelled at the free providers, but the effect is the same).
So that's a rant on the first part of the question. Heart-felt, though.
is there any valid reason to use an
untrusted language? It seems like
making it so that any user can execute
any operation would be a bad idea
My goodness, yes it does! A sort of "Perl injection attack"? Almost worth doing it just to see what happens, I'd have thought.
For philosophical reasons outlined above I think I'll pass on the PL/LOLCODE challenge. Although I was somewhat amazed to discover it was a link to something extant.
From my perspective, I guess the answer is 'it depends'.
There is an argument that manipulation of the data belongs in the database layer, so that the business logic does not need to be overly concerned about how the manipulation happens, it just knows that it has.
Another very good reason to process data on the db layer is if the volume of data being crunched means that network bandwidth will become an issue. I once had to categorise very large amounts of data. Processing this in the application layer was severly restricted by the time required to transfer all the data across the network for processing.
I then wrote a binning algorithm in PL/pgSQL and it worked much faster.
Regarding untrusted languages, I heard a podcast from Josh Berkus (a postgres advocate) who discussed an application of postgresql that brought in data from MySQL as part of its processing, so that the communication itself was handled by the postgres server. I don't remember the full details, I think it was on the FLOSS Weekly podcast which is quite an interesting discussion of the history of PostGRESQL and some of the issues it is put to.
The untrusted versions of the procedural languages allow you to access I/O on the system.
This can come in handy if you need a trigger or something send a email or connect to a socket server to send a popup notification. There are tons of uses for this type of thing, and because of postgresql isolation levels you cans safely do things like this.
You can put checkpoints in the function so if the transaction fails the email or whatever won't go out. The nice thing about doing this is it removes the logic from the client and puts it on the server.
I think most additional languages are offered so that if you develop in that language on a regular basis, you can feel comfortable writing db functions, triggers, etc. The usefulness of these features is to provide a control over data as close to the data as possible.
An example of a useful stored procedure I recently wrote in an external language that would not have been possible in pl/sql is a version of 'df' which allowed SQL table generators to pick a tablespace with the most free space available at runtime.
I used plperlu, and it was relatively simple, although I had to be careful with data typing.

Catching SQL Injection and other Malicious Web Requests

I am looking for a tool that can detect malicious requests (such as obvious SQL injection gets or posts) and will immediately ban the IP address of the requester/add to a blacklist. I am aware that in an ideal world our code should be able to handle such requests and treat them accordingly, but there is a lot of value in such a tool even when the site is safe from these kinds of attacks, as it can lead to saving bandwidth, preventing bloat of analytics, etc.
Ideally, I'm looking for a cross-platform (LAMP/.NET) solution that sits at a higher level than the technology stack; perhaps at the web-server or hardware level. I'm not sure if this exists, though.
Either way, I'd like to hear the community's feedback so that I can see what my options might be with regard to implementation and approach.
Your almost looking at it the wrong way, no 3party tool that is not aware of your application methods/naming/data/domain is going to going to be able to perfectly protect you.
Something like SQL injection prevention is something that has to be in the code, and best written by the people that wrote the SQL, because they are the ones that will know what should/shouldnt be in those fields (unless your project has very good docs)
Your right, this all has been done before. You dont quite have to reinvent the wheel, but you do have to carve a new one because of a differences in everyone's axle diameters.
This is not a drop-in and run problem, you really do have to be familiar with what exactly SQL injection is before you can prevent it. It is a sneaky problem, so it takes equally sneaky protections.
These 2 links taught me far more then the basics on the subject to get started, and helped me better phrase my future lookups on specific questions that weren't answered.
SQL injection
SQL Injection Attacks by Example
And while this one isnt quite a 100% finder, it will "show you the light" on existing problem in your existing code, but like with webstandards, dont stop coding once you pass this test.
Exploit-Me
The problem with a generic tool is that it is very difficult to come up with a set of rules that will only match against a genuine attack.
SQL keywords are all English words, and don't forget that the string
DROP TABLE users;
is perfectly valid in a form field that, for example, contains an answer to a programming question.
The only sensible option is to sanitise the input before ever passing it to your database but pass it on nonetheless. Otherwise lots of perfectly normal, non-malicious users are going to get banned from your site.
One method that might work for some cases would be to take the sql string that would run if you naively used the form data and pass it to some code that counts the number of statements that would actually be executed. If it is greater than the number expected, then there is a decent chance that an injection was attempted, especially for fields that are unlikely to include control characters such as username.
Something like a normal text box would be a bit harder since this method would be a lot more likely to return false positives, but this would be a start, at least.
One little thing to keep in mind: In some countries (i.e. most of Europe), people do not have static IP Addresses, so blacklisting should not be forever.
Oracle has got an online tutorial about SQL Injection. Even though you want a ready-made solution, this might give you some hints on how to use it better to defend yourself.
Now that I think about it, a Bayesian filter similar to the ones used to block spam might work decently too. If you got together a set of normal text for each field and a set of sql injections, you might be able to train it to flag injection attacks.
One of my sites was recently hacked through SQL Injection. It added a link to a virus for every text field in the db! The fix was to add some code looking for SQL keywords. Fortunately, I've developed in ColdFiusion, so the code sits in my Application.cfm file which is run at the beginning of every webpage & it looks at all the URL variables. Wikipedia has some good links to help too.
Interesting how this is being implemented years later by google and them removing the URL all together in order to prevent XSS attacks and other malicious acitivites