Why do SQL errors not show you the error source? - sql

Is it possible to find the line or column where an error is occurring when executing SQL code in Oracle SQL developer?
For example, imagine you are running a very simple line of code
SELECT * FROM employeesTbl WHERE active = 1
But for some reason, active is VARCHAR and someone has entered the ";!/asd02" into this field.
You will only get an ORA- error, but it does not tell you which row caused it.
Does anyone know why this is?

The reason behind this is that in general developer support in sql, pl/sql and the like is really abysmal. One result is a really broken exception concept in pl/sql, almost useless exceptions in (oracle) sql and little hope that it is better in any rdbms.
I think the reason behind all that is that databases are persistent beasts (pun intended). Many companies and developers change from time to time there preferred main development language (C, C++, VB, Java, C#, Groovy, Scala ..). But they rarely change the database, possibly because you will still have the old databases around with no chance to migrate them.
This in turn means most DB-devs know only a single database system reasonable well, so they don't see what is possible in other systems. Therefore there is little to no pressure to make database systems any more usable for developers.

Multiple rows may contain errors. For the system to be consistent (as a "set-based" language), it ought to return you all rows which contain errors - and not all row errors may be caused by the same error.
However, it could be computationally expensive to compute this entire error set - and the system "knows" that any further computation on this query is going to result in failure anyway - so it represents wasted resources when other queries could be running successfully.
I agree that it would be nice to turn on this type of reporting as an option (especially in non-production environments), but no database vendor seems to have done so.

You get an error because the field is a character and you're assuming it's a number. Which, you shouldn't be doing. If you want the field to be numeric then you have to have a numeric field! This is a general rule, all non-character columns should be the correct data-type to avoid this type of problem.
I'm not certain why Oracle doesn't tell you what row caused the error, it may be physically possible using the rowid in a simple select as you have here. If you're joining tables or using conversion functions such as to_number it would become a lot more difficult, if possible at all.
I would imagine that Oracle did not want to implement something only partially, especially when this is not an Oracle error but a coding error.
To sort out the problem create the following function:
create or replace function is_number( Pvalue varchar2
) return number is
/* Test whether a value is a number. Return a number
rather than a plain boolean so it can be used in
SQL statements as well as PL/SQL.
*/
l_number number;
begin
-- Explicitly convert.
l_number := to_number(Pvalue);
return 1;
exception when others then
return 0;
end;
/
Run the following to find your problem rows:
SELECT * FROM employeesTbl WHERE is_number(active) = 0
Or this to ignore them:
SELECT *
FROM ( SELECT *
FROM employeesTbl
WHERE is_number(active) = 1 )
WHERE active = 1

Related

Looking for an explanation of this attempted SQL injection query

Looking through my logs I found the following query string as an attempt to perform a SQL injection, probably from an automated tool:
(select*from(select+sleep(10)union/**/select+1)a)
From what I can tell, it’s attempting a timing based attack to see if any of the tables in my database start with “a” - the sleep function will only run if the union query matches something? But I am a bit confused about other parts of the attack:
Why are there plus signs between parts of the query?
Why is there a comment as part of the query string?
Would be interested in any answers - I’m fairly certain my site hasn’t been compromised as I haven’t scanned further activity on that query and can’t get it to execute myself, so just wondering if my intuition was correct. Cheers!
I don't know what the point of this is, nor what the point is of trying to figure out the point. Injections are easier to block than to reverse engineer, and the latter doesn't contribute much to the former.
The point of the + and the /**/ are probably pretty much the same, they separate tokens without the use of whitespace. Presumably someone thinks whitespace is going to trigger some kind of alarm or blockage.
The 'a' is just an alias, and is probably there to avoid the error 'ERROR: subquery in FROM must have an alias'
This won't work in stock PostgreSQL because there is no function spelled sleep. They might be targeting a different DBMS, or maybe PostgreSQL with a specific app/framework in use which creates its own sleep function.
The sleep is probably there in case the system doesn't return meaningful messages to the end user. If it takes 10 seconds to get a response, then you know the sleep got executed. If it immediately returns, you know it didn't execute, but don't know why it didn't.
This is meant to detect a SQL injection (probably through an HTML parameter) via a timing attack. The inserted comments (as other people have mentioned) are meant to remove whitespace while still allowing the query to parse in an attempt to fool custom (badly designed) sanitization. The "+" is likely meant to be decoded into a space after passing through HTML decoding.
If you replace the whitespace and add indentation it's easier to see what's going on:
select * <-- match any number of columns on the original query
from
(select <-- nested sub-query in the from clause
sleep(10) <-- timing attack meant to detect whether the SQL ran
union <-- not sure why the union is needed
select 1) a <-- alias the subquery to "a"
) <-- close off matching parens in injected SQL?
I don't think this is attempting to look for tables that start with a, simply run a sleep on a possible recursive query, which could cause your database trouble, if a bunch of them execute.
The + signs are likely an attempt to do some string concatenation... That would be my guess
Regardless I would strongly look at tracing back where this originated from and sanitizing your inputs on your site so raw inputs ( potential sql ) is not being dropped into queries.

'-999' used for all condition

I have a sample of a stored procedure like this (from my previous working experience):
Select * from table where (id=#id or id='-999')
Based on my understanding on this query, the '-999' is used to avoid exception when no value is transferred from users. So far in my research, I have not found its usage on the internet and other company implementations.
#id is transferred from user.
Any help will be appreciated in providing some links related to it.
I'd like to add my two guesses on this, although please note that to my disadvantage, I'm one of the very youngest in the field, so this is not coming from that much of history or experience.
Also, please note that for any reason anybody provides you, you might not be able to confirm it 100%. Your oven might just not have any leftover evidence in and of itself.
Now, per another question I read before, extreme integers were used in some systems to denote missing values, since text and NULL weren't options at those systems. Say I'm looking for ID#84, and I cannot find it in the table:
Not Found Is Unlikely:
Perhaps in some systems it's far more likely that a record exists with a missing/incorrect ID, than to not be existing at all? Hence, when no match is found, designers preferred all records without valid IDs to be returned?
This however has a few problems. First, depending on the design, user might not recognize the results are a set of records with missing IDs, especially if only one was returned. Second, current query poses a problem as it will always return the missing ID records in addition to the normal matches. Perhaps they relied on ORDERing to ease readability?
Exception Above SQL:
AFAIK, SQL is fine with a zero-row result, but maybe whatever thing that calls/used to call it wasn't as robust, and something goes wrong (hard exception, soft UI bug, etc.) when zero rows are returned? Perhaps then, this ID represented a dummy row (e.g. blanks and zeroes) to keep things running.
Then again, this also suffers from the same arguments above regarding "record is always outputted" and ORDER, with the added possibility that the SQL-caller might have dedicated logic to when the -999 record is the only record returned, which I doubt was the most practical approach even in whatever era this was done at.
... the more I type, the more I think this is the oven, and only the great grandmother can explain this to us.
If you want to avoid exception when no value transferred from user, in your stored procedure declare parameter as null. Like #id int = null
for instance :
CREATE PROCEDURE [dbo].[TableCheck]
#id int = null
AS
BEGIN
Select * from table where (id=#id)
END
Now you can execute it in either ways :
exec [dbo].[TableCheck] 2 or exec [dbo].[TableCheck]
Remember, it's a separate thing if you want to return whole table when your input parameter is null.
To answer your id = -999 condition, I tried it your way. It doesn't prevent any exception

Why are dot-separated prefixes ignored in the column list for INSERT statements?

I've just come across some SQL syntax that I thought was invalid, but actually works fine (in SQL Server at least).
Given this table:
create table SomeTable (FirstColumn int, SecondColumn int)
The following insert statement executes with no error:
insert SomeTable(Any.Text.Here.FirstColumn, It.Does.Not.Matter.What.SecondColumn)
values (1,2);
The insert statement completes without error, and checking select * from SomeTable shows that it did indeed execute properly. See fiddle: http://sqlfiddle.com/#!6/18de0/2
SQL Server seems to just ignore anything except the last part of the column name given in the insert list.
Actual question:
Can this be relied upon as documented behaviour?
Any explanation about why this is so would also be appreciated.
It's unlikely to be part of the SQL standard, given its dubious utility (though I haven't checked specifically (a)).
What's most likely happening is that it's throwing away the non-final part of the column specification because it's superfluous. You have explicitly stated what table you're inserting into, with the insert into SomeTable part of the command, and that's the table that will be used.
What you appear to have done here is to find a way to execute SQL commands that are less readable but have no real advantage. In that vein, it appears similar to the C code:
int nine = 9;
int eight = 8;
xyzzy = xyzzy + nine - eight;
which could perhaps be better written as xyzzy++; :-)
I wouldn't rely on it at all, possibly because it's not standard but mostly because it makes maintenance harder rather than easier, and because I know DBAs all over the world would track me down and beat me to death with IBM DB2 manuals, their choice of weapon due to the voluminous size and skull-crushing abilities :-)
(a) I have checked non-specifically, at least for ISO 9075-2:2003 which dictates the SQL03 language.
Section 14.8 of that standard covers the insert statement and it appears that the following clause may be relevant:
Each column-name in the insert-column-list shall identify an updatable column of T.
Without spending a huge amount of time (that document is 1,332 pages long and would take several days to digest properly), I suspect you could argue that the column could be identified just using the final part of the column name (by removing all the owner/user/schema specifications from it).
Especially since it appears only one target table is possible (updatable views crossing table boundaries notwithstanding):
<insertion target> ::= <table name>
Fair warning: I haven't checked later iterations of the standard so things may have changed. But I'd consider that unlikely since there seems to be no real use case for having this feature.
This was reported as a bug on Connect and despite initially encouraging comments the current status of the item is closed as "won't fix".
The Order by clause used to behave in a similar fashion but that one was fixed in SQL Server 2005.
I get errors when I try to run the script on SQL Server 2012 as well as SQL Server 2014 and SQL Server 2008 R2. So you can certainly not rely on the behavior you see with sqlfiddle.
Even if this were to work, I would never rely on undocumented behavior in production code. Microsoft will include notice of breaking changes with documented features but not undocumented ones. So if this were an actual T-SQL parsing bug that was later fixed, it would break the mal-formed code.

cursor_sharing parameter in Oracle

I would like to know the tradeoff in setting cursor_sharing parameter in Oracle as "FORCE" .
As this would try to soft-parse any SQL statement, and certainly the performance must be improved.
But the default value is "EXACT", so I would like to know if there is any danger in setting it as FORCE or SIMILAR.
Unless you really know what you're doing, I'd recommend not changing this setting.
Usually, if you've got a high number of hard parses, it's an indication of bad application design.
A typical example for selecting all products for a given category (pseudocode):
stmt = 'select * from products where category = ' || my_category
results = stmt.execute
This is flawed for a number of reasons:
it creates a different SQL statement for each category, therefore increasing the number of hard parses dramatically
it is vulnerable to SQL injection attacks
A good application runs perfectly OK with cursor_sharing = exact. A good application can use literals for specific reasons, for example select orders with state = new. That use of literals is OK. If the application uses literals to identify an order by ID it would be different since the will be many different order ID's.
Best is to clean-up the app to use literals in the correct way or to start using prepared statements for best performance.
IF you happen to have an application that only uses literals, set cursor_sharing to FORCE. In 11g there are mechanisms, like cardinality feedback to be able to adjust an execution plan based on un expected row counts that came from a query to make sure that the plans that are originally planned for a query are corrected based on the input and output, for the next time it is used.

Is it a bad idea to use SQLExecDirect with preformatted query string instead of SQLPrepare+SQLBindParameter+SQLExecute?

OK, at the first glance, it seems that it must be more efficient to use SQLPrepare+SQLBindParameter+SQLExecute than format string (e.g. with CString::Format) and pass the whole complete query string to SQLExecDirect. If not, why would there exist the second method (SQLPrepare+SQLBindParameter+SQLExecute) at all?
BUT... here is what I think: The driver has sooner or later (I suspect later, but anyway...) convert the parameters (that I feed it with SQLBindParameter) into string representation right? (Or maybe not?) So if I make this formatting in my application (printf-like formatting), will I have any loss in performance?
One thing I suspect is that when the connection is over the network, passing parameters as raw data and then formatting them at server end might decrease the network traffic, instead of passing preformatted query strings, but lets ignore the network traffic for a moment. If not that, is there any performance gain in using SQLPrepare+SQLBindParameter+SQLExecute instead of formatting full query string in application and then using SQLExecDirect?
For me using SQLExecDirect is simpler and more convenient, so I need good answer on whether (and if) I should opt to another approach.
Important: If you will say that SQLPrepare+SQLBindParameter+SQLExecute approach will give better performance, I'd like to know how much! I don't mind theoretical assumptions, I'd like to know when is it worth practically? My current use-case is not very db-intensive, I won't have more than 100 inserts/updates per second, is it ok to use SQLExecDirect? In what scenarios - if ever - do I have to use SQLPrepare+SQLBindParameter+SQLExecute?
If you are inserting or updating with the same SQL (excluding parameters) repeatedly then SQLPrepare, SQLBindParameter and SQLExecute will be faster than SQLExecDirect every time. Consider:
SQLPrepare("insert into mytable (cola, colb) values(?,?);");
for (n = 0; n < 10000; n++) {
SQLBindParameter(1, n);
SQLBindParameter(2, n);
SQLExecute;
}
and
for (n = 0; n < 10000; n++) {
char sql[1000];
sprintf("insert into mytable (cola, colb) values(%d,%d)", n, n);
SQLExecDirect(sql);
}
In the first example, the statement is prepared once and hence the db engine only has to parse it once and work out an execution plan once. In the second example the sql and parameters are passed every time and the SQL looks different every time so it is parsed each time.
In addition, in the first example you can use arrays of parameters to pass multiple rows of parameters in one go - see SQL_PARAMSET_SIZE.
See 3.1.2 Inserting data for a worked example and an indication of how much time you can save.
Ignore network traffic, you'll just be second guessing what happens under the hood in the driver.
ADDITION:
Regarding your description of what happens with parameters where you seem to think the driver will convert them to strings; the other advantage of binding parameters is you can provide them in one type and ask the driver to use them as another type. You may find you'll come across a parameter type which cannot easily be represented as a string without adding some sort of conversion function which could be avoided with a parameter.
Yes, it's a bad idea, and for two reasons:
Performance
SQLPrepare causes the SQL statement to be parsed, and depending on the SQL statement it can be time consuming. If you're using a DB on another server, it might get sent to it for parsing. Even if the parsing takes only e.g. 10% time of executing your whole query, you save that time when executing the statement twice. That may be the case when you're inserting multiple rows, or call a "select" another time.
Of course the SQL statement passed must always be a static string. Some SQL frameworks may even do prepared statement caching for you. I don't know if ODBC does this. If you want to have real performance numbers, you have to measure for yourself - every query is different (and even might depend on the table contents, too).
SQL Injection
No matter what you say where the data comes from that you're formatting with CString::Format or any other method, you might be at risk for SQL injection. Even if you're using strings from your sources, sometimes later you or someone other may be changing your code to accept data from outside, and then you're vulnerable to SQL injection. If you need more info about SQL injection, just search StackOverflow, I'm sure there are some good questions about it, or see this image: