I know that I should use it when I deal with data of TEXT type (and I guess the ones that fall back to TEXT), but is it the only case?
Example:
UPDATE names SET name='Mike' WHERE id=3
I'm writing an SQL query auto generation in C++, so I want to make sure I don't miss cases, when I have to add quotes.
Single quotes (') denote textual data, as you noted (e.g., 'Mike' in your example). Numeric data (e.g., 3 in your example), object (table, column, etc) names and syntactic elements (e.g., update, set, where) should not be wrapped in quotes.
The single quote is the delimiter for the string. It lets the parser know where the string starts and where it ends as well as that is is a string. You will find that sometimes you get away with a double quote too.
The only way to be certain you don't miss any cases would be to escape the input, otherwise this will be vulnerable to abuse when somehow a single quote ends up in in the text.
Related
I noticed an inconsistency in how "HANA SQL" escapes single quotes in the context of the PLACEHOLDER clause. For example, consider the following PLACEHOLDER clause snippet:
('PLACEHOLDER' = ('$$CC_PARAM$$','''foo'',''an escaped single quote \'' '''))
The PLACEHOLDER clause above contains multiple values assigned to the CC_PARAM. parameter. We can see that inside of the second argument we have a single quote that's escaped with a backslash. However, we escape the single quotes outside each argument with another single quote (i.e. we do '' instead of \''. It's possible to use the \'' format for the first case, but it's not possible to use the '' format in the second case.
Why is there this discrepancy? It makes escaping quotes in multi-input input parameters tricky. I'm looking to programmatically craft SQL queries for HANA. Am I missing something here? Is it safe to use \'' over '' in all cases? Or do I need logic that can tell where a single quote occurs and escape as appropriate?
The implicit rule here - given by how the software is implemented - is that for parameter values of calculation views, the backslash \ is used to escape the single quotation mark.
For all standard SQL string occurrences, using the single-quotation mark twice '' is the correct way to differentiate between syntax element and string literal.
As for the why:
the PLACEHOLDER syntax is not SQL, but a HANA-specific command extension. So, there is no general standard that the current implementation violates.
that given, this command extension is embedded into, respectively clamped onto the standard SQL syntax and has to be handled by the same parser.
But the parameters are not only parsed once, by the SQL parser but again by the component that instantiates the calculation scenario based on the calculation view. With a bit of squinting it's not hard to see that the parameters interface is a general key-value interface that allows for all sorts of information to be handed over to the calc. engine.
One might argue that the whole approach of providing parameters via key-value pairs is not consistent with the general SQL syntax approach and be correct. On the flip side, this approach allows for general flexibility for adding new command elements to the HANA-specific parts, without structurally changing the syntax (and with it the parser).
The clear downside of this is that both the key names, as well as the values, are string-typed. To avoid losing the required escaping for the "inner string" an escape string different from the main SQL escape string needs to be used.
And here we are with two different ways of handing over a string value to be used as a filter condition.
Funny enough, both approaches may still lead to the same query execution plan.
As a matter of fact, in many scenarios with input parameters, the string value will be internally converted into a SQL conforming form. This is the case when the input parameter is used for filtering or in expressions in the calc. view that can be converted into SQL expressions.
For example
SELECT
"AAA"
FROM "_SYS_BIC"."sp/ESC"
('PLACEHOLDER' = ('$$IP_TEST$$', 'this is a test\''s test'));
shows the following execution plan on my system
OPERATOR_NAME OPERATOR_DETAILS
PROJECT TEST.AAA
COLUMN TABLE FILTER CONDITION: TEST.AAA = 'this is a test's test'
(DETAIL: ([SCAN] TEST.AAA = 'this is a test's test'))
Note how the escape-\' has been removed.
All in all: when using PLACEHOLDER values, the \' escaping needs to be used and in all other cases, the '' escaping.
That should not be terribly difficult to implement for a query builder as you can consider this when dealing with the PLACEHOLDER syntax.
I came across some SQL in an application which had no space before the "ORDER BY" clause. I was surprised that this even works.
Given a table of numbers, called [counter] where there is simply one column, counter_id that is an incrementing list of integers this SQL works fine in Microsoft SQL Server 2012
select
*
FROM [counter] c
where c.counter_id = 1000ORDER by counter_id
This also works with strings, e.g.:
WHERE some_string = 'test'ORDER BY something
My question is, are there any potential pitfalls or dangers with this query? And conversely, are there any benefits? Other than saving, what, 8 bits of network traffic for that whitespace (whcih may well be a consideration in some applications)
Let me explain the reason why this works with numbers and strings.
The reason is because numbers cannot start identifiers, unless the name is escaped. Basically, the first things that happens to a SQL query is tokenization. That is, the components of the query are broken into identifiers and keywords, which are then analyzed.
In SQL Server, keywords and identifiers and function names (and so on) cannot start with a digit (unless the name is escaped, of course). So, when the tokenizer encounters a digit, it knows that it has a number. The number ends when a non-digit character is encountered. So, a sequence of characters such as 1000ORDER BY is easily turned into three tokens, 1000, ORDER, and BY.
Similarly, the first time that a single quote is encountered, it always represents a string literal. The string literal ends when the final single quote is encountered. The next set of characters represents another token.
Let me add that there is exactly zero reason to ever use these nuances. First, these rules are properties of SQL Server's tokenization and do not necessarily apply to other databases. Second, the purpose of SQL is for humans to be able to express queries. It is way, way more important that we read them.
As jarlh mentioned there might be difference during scanning and parsing the tokens but it creates correctly during execution plan, hence it might not be huge difference in advantages or disadvantages
When parser examines characters ,it checks for keywords,identifiers,string constants and match overall semantic and syntactic structure of the language. Since 'Order by' is a keyword and sql parser knows its possible syntactic location in a query, it will interpret it accordingly without throwing any error. This is the reason why your order by will not throw any error.
Parsing sql query
Parsing SQL
I have a string with value
'MAX DATE QUERY: SELECT iso_timestamp(MAX(time_stamp)) AS MAXTIME FROM observation WHERE offering_id = 'HOBART''
But on inserting into postgresql table i am getting error:
org.postgresql.util.PSQLException: ERROR: syntax error at or near "HOBART".
This is probably because my string contains single quotes. I don't know my string value. Every time it keeps changing and may contain special characters like \ or something since I am reading from a file and saving into postgres database.
Please give a general solution to escape such characters.
As per the SQL standard, quotes are delimited by doubling them, ie:
insert into table (column) values ('I''m OK')
If you replace every single quote in your text with two single quotes, it will work.
Normally, a backslash escapes the following character, but literal backslashes are similarly escaped by using two backslashes"
insert into table (column) values ('Look in C:\\Temp')
You can use double dollar quotation to escape the special characters in your string.
The above query as mentioned insert into table (column) values ('I'm OK')
changes to insert into table (column) values ($$I'm OK$$).
To make the identifier unique so that it doesn't mix with the values, you can add any characters between 2 dollars such as
insert into table (column) values ($aesc6$I'm OK$aesc6$).
here $aesc6$ is the unique string identifier so that even if $$ is part of the value, it will be treated as a value and not a identifier.
You appear to be using Java and JDBC. Please read the JDBC tutorial, which describes how to use paramaterized queries to safely insert data without risking SQL injection problems.
Please read the prepared statements section of the JDBC tutorial and these simple examples in various languages including Java.
Since you're having issues with backslashes, not just 'single quotes', I'd say you're running PostgreSQL 9.0 or older, which default to standard_conforming_strings = off. In newer versions backslashes are only special if you use the PostgreSQL extension E'escape strings'. (This is why you always include your PostgreSQL version in questions).
You might also want to examine:
Why you should use prepared statements.
The PostgreSQL documentation on the lexical structure of SQL queries.
While it is possible to explicitly quote values, doing so is error-prone, slow and inefficient. You should use parameterized queries (prepared statements) to safely insert data.
In future, please include a code snippet that you're having a problem with and details of the language you're using, the PostgreSQL version, etc.
If you really must manually escape strings, you'll need to make sure that standard_conforming_strings is on and double quotes, eg don''t manually escape text; or use PostgreSQL-specific E'escape strings where you \'backslash escape\' quotes'. But really, use prepared statements, it's way easier.
Some possible approaches are:
use prepared statements
convert all special characters to their equivalent html entities.
use base64 encoding while storing the string, and base64 decoding while reading the string from the db table.
Approach 1 (prepared statements) can be combined with approaches 2 and 3.
Approach 3 (base64 encoding) converts all characters to hexadecimal characters without loosing any info. But you may not be able to do full-text search using this approach.
Literals in SQLServer start with N like this:
update table set stringField = N'/;l;sldl;'''mess'
Lately I have been doing a security pass on a PHP application and I've already found and fixed one XSS vulnerability (both in validating input and encoding the output).
How can I query the database to make sure there isn't any malicious data still residing in it? The fields in question should be text with allowable symbols (-, #, spaces) but shouldn't have any special html characters (<, ", ', >, etc).
I assume I should use regular expressions in the query; does anyone have prebuilt regexes especially for this purpose?
If you only care about non-alphanumerics and it's SQL Server you can use:
SELECT *
FROM MyTable
WHERE MyField LIKE '%[^a-z0-9]%'
This will show you any row where MyField has anything except a-z and 0-9.
EDIT:
Updated pattern would be: LIKE '%[^a-z0-9!-# ]%' ESCAPE '!'
I had to add the ESCAPE char since you want to allow dashes -.
For the same reason that you shouldn't be validating input against a black-list (i.e. list of illegal characters), I'd try to avoid doing the same in your search. I'm commenting without knowing the intent of the fields holding the data (i.e. name, address, "about me", etc.), but my suggestion would be to construct your query to identify what you do want in your database then identify the exceptions.
Reason being there are just simply so many different character patterns used in XSS. Take a look at the XSS Cheat Sheet and you'll start to get an idea. Particularly when you get into character encoding, just looking for things like angle brackets and quotes is not going to get you too far.
In one of my project, it was required to have a table with space in between. Some suggest me not to include spaces because it is not a good technique.
we can still implement it using single-double quotes for table name in queries. But i need a solid backing for not opting spaces. Please help.
It makes it harder to read, creates complexity if you ever want to do dynamic SQL. Spaces in the tables names on the other hand add no value whatsoever.
Mr. Anderson points out that its tedious. This is true enough, but more importantly it adds unnecessary tediousness.
I would never use spaces (nor other special characters) in table or column names.
Out of lazyness is one point (so typing SQL queries is a lot easier because you don't need those dreaded quotes)
Secondly a lot of tools out there might still have problems with non-standard table names.
Btw: the quote character for non-standard object names is a double quote (")
If you really go down that road, I would highly recommend to put MySQL into "ANSI Mode" in order to be compatible with the rest of the (DBMS) world.
(Single quotes are for character literals, double quotes for "escaping" non-standard names)