Postgres Querying ILIKE and "%#{}%" - sql

Can I get an explanation to me how the "?" prevents sql injection?
Candy.where("taste ILIKE ?", "%#{term}%")
Also why is "%{term}%" used as opposed to just #{term}? What do the percentages represent?

Percentages are wild card characters that matches the string in any part of the value

You've actually asked two different questions there.
Neither of them are particularly related to Rails, so I am going to answer them generically (also because I'm not that familiar with Ruby!).
How does using '?' prevent SQL Injection
SQL Injection occurs when you use values provided from outside your program - user provided values - directly in SQL statements. For example, suppose you had this pseudo code:
sql="SELECT foo FROM bar WHERE name='"+name+"'"
where perhaps name was a variable containing user inputted data. However, if name contained a single quote (') then the SQL engine would think that single quote was the end of the value and continue parsing the remainder of the variable as SQL text.
Using placeholders (such as '?') avoid this because the value inside the placeholder does not need to be quoted - all of the content of the placeholder is treated as part of the value, none of it will be parsed as SQL, regardless of any embedded quotes.
Incidentally, the actual form of the placeholders used is somewhat dependent on the actual DB engine used and/or the client framework. Natively, Postgresql uses the $1, $2, etc for placeholders. Many frameworks extend this to allow '?' and other placeholder syntaxes.
Why is "%#{term}%" used as opposed to just #{term}
The SQL ILIKE operator uses '%' signs as wildcards. An expression such as:
taste ILIKE '%apple%'
would match 'apple', 'rottenApple', 'applesauce' or any other string containing 'apple' using a case insensitive match.
Note that the '%' signs are part of the right hand operand to ILIKE, within the quotes, so you cannot use placeholders like this:
Candy.where("taste ILIKE %?%", "#{term}")
An alternative would be:
Candy.where("taste ILIKE '%' || ? || '%'", "#{term}")
That works because || is the concatenation operator, so it concatenates the literal value % with the value of the placeholder and then the trailing literal value %.

When you use ? Rails will escape SQL control characters by itself.

Related

Django ORM underscore wildcard

I have been searching for a way of using Django ORM to use the SQL underscore wildcard, and do something equivalent to this:
SELECT * FROM table
WHERE field LIKE 'abc_wxyz'
Currently, I am doing:
field_like = 'abc_wxyz'
result = MyClass.objects.extra(where=["field LIKE " + field_like])
I already tried with contains() and icontains(), but that's not what I need, since what it does is adding parenthesis to the query:
SELECT * FROM table
WHERE field LIKE '%abc/_wxyz%'
Thanks!
You can use __regex lookup to build more complex lookup expressions than __contains, __startswith or __endswith (can add "i" character to beginning of each of these to make lookups case insensitive, like icontains). In your case, I think
MyClass.objects.filter(field__regex=r'^abc.wxyz$')
Would do what you are trying to do.
You can use the field__contains attribute.
for example:
MyClass.objects.filter(field__contains='abc_wxyz')
This is equivalent to:
SELECT * FROM MyClass WHERE field LIKE 'abc_wxyz'
Lord Elron's answer is incorrect. Django escapes all developer supplied wildcard characters to the LIKE-type lookups. The statement is equivalent to
SELECT * FROM MyClass WHERE field LIKE '%abc/_wxyz%'
(as the OP discovered) and the underscore has no effect.
See Escaping percent signs and underscores in LIKE statements
The field lookups that equate to LIKE SQL statements (iexact, contains, icontains, startswith, istartswith, endswith and iendswith) will automatically escape the two special characters used in LIKE statements – the percent sign and the underscore.

Escaping single quotes in the PLACEHOLDER clause of a HANA SQL statement

I noticed an inconsistency in how "HANA SQL" escapes single quotes in the context of the PLACEHOLDER clause. For example, consider the following PLACEHOLDER clause snippet:
('PLACEHOLDER' = ('$$CC_PARAM$$','''foo'',''an escaped single quote \'' '''))
The PLACEHOLDER clause above contains multiple values assigned to the CC_PARAM. parameter. We can see that inside of the second argument we have a single quote that's escaped with a backslash. However, we escape the single quotes outside each argument with another single quote (i.e. we do '' instead of \''. It's possible to use the \'' format for the first case, but it's not possible to use the '' format in the second case.
Why is there this discrepancy? It makes escaping quotes in multi-input input parameters tricky. I'm looking to programmatically craft SQL queries for HANA. Am I missing something here? Is it safe to use \'' over '' in all cases? Or do I need logic that can tell where a single quote occurs and escape as appropriate?
The implicit rule here - given by how the software is implemented - is that for parameter values of calculation views, the backslash \ is used to escape the single quotation mark.
For all standard SQL string occurrences, using the single-quotation mark twice '' is the correct way to differentiate between syntax element and string literal.
As for the why:
the PLACEHOLDER syntax is not SQL, but a HANA-specific command extension. So, there is no general standard that the current implementation violates.
that given, this command extension is embedded into, respectively clamped onto the standard SQL syntax and has to be handled by the same parser.
But the parameters are not only parsed once, by the SQL parser but again by the component that instantiates the calculation scenario based on the calculation view. With a bit of squinting it's not hard to see that the parameters interface is a general key-value interface that allows for all sorts of information to be handed over to the calc. engine.
One might argue that the whole approach of providing parameters via key-value pairs is not consistent with the general SQL syntax approach and be correct. On the flip side, this approach allows for general flexibility for adding new command elements to the HANA-specific parts, without structurally changing the syntax (and with it the parser).
The clear downside of this is that both the key names, as well as the values, are string-typed. To avoid losing the required escaping for the "inner string" an escape string different from the main SQL escape string needs to be used.
And here we are with two different ways of handing over a string value to be used as a filter condition.
Funny enough, both approaches may still lead to the same query execution plan.
As a matter of fact, in many scenarios with input parameters, the string value will be internally converted into a SQL conforming form. This is the case when the input parameter is used for filtering or in expressions in the calc. view that can be converted into SQL expressions.
For example
SELECT
"AAA"
FROM "_SYS_BIC"."sp/ESC"
('PLACEHOLDER' = ('$$IP_TEST$$', 'this is a test\''s test'));
shows the following execution plan on my system
OPERATOR_NAME OPERATOR_DETAILS
PROJECT TEST.AAA
COLUMN TABLE FILTER CONDITION: TEST.AAA = 'this is a test's test'
(DETAIL: ([SCAN] TEST.AAA = 'this is a test's test'))
Note how the escape-\' has been removed.
All in all: when using PLACEHOLDER values, the \' escaping needs to be used and in all other cases, the '' escaping.
That should not be terribly difficult to implement for a query builder as you can consider this when dealing with the PLACEHOLDER syntax.

How does SQL's LIKE work in case of Path Enumeration?

I am reading the book SQL Antipatterns where a SQL query is used like this:
SELECT *
FROM Comments AS c
WHERE '1/4/6/7/' LIKE c.path || '%';
to find ancestors of comment #7 from this table:
I am not much familiar with the regex employed for LIKE and would appreciate understanding how it does its work. Specifically, does it matter that the literal '1/4/6/7' is located on the left hand of the LIKE keyword? And how does the entire WHERE predicate work (i.e. || '%')?
First of all, in case it is not clear, the || is the string concatenation operator. So, if the value of c.path is '1/', then c.path || '%' yields '1/%'.
So, obviously, you cannot do WHERE field LIKE 'constant%' because in this particular (weird) kind of query it is the constant that may be longer than the field, and not the other way around.
Usually, what we do with LIKE is WHERE field LIKE 'constant%' to check whether the value of the field starts with the constant. Here the author of the query wants to see whether the constant starts with the value of the field, which is a bizarre thing to do.
Simple LIKE expression in SQL (as opposed to regex LIKE, available in some RDBMS) does not support regular expressions. Instead, it supports two special "wildcard" characters: underscore _ that is roughly equivalent to dot . in regex, and percent % which is roughly equivalent to .* construct.
|| in the example is concatenation operator, similar to operator + applied to String objects in Java. Hence, a constant value 1/4/6/7/ is compared to a string from the path column followed by any characters - essentially, a prefix match.
This is a bad approach, because it places data from the table on the right side of the LIKE expression. This is very expensive, because this operation cannot use indexing, making the search run very slowly.

SQL LIKE to find either -,/,_

Trying to select from table where the format can be either 1/2/2014, 1-2-2014 or 1_2_2014 in a text field. There's other text involved outside of this format but it shouldn't matter, but that's why this is text not a date type.
I tried '%[-,_,/]%[-,_,/]%' which doesn't work, and I've tried escaping the special characters in the brackets such as %[-,!_,/]%[-,!_,/]%' ESCAPE '!' which also doesn't work. Any suggestions?
I wanted to avoid using three searches like,
LIKE '%/%/%'
OR '%-%-%'
OR '%!_%!_%' ESCAPE '!'
EDIT: Using SQLite3
There is no regex like behavior in using the LIKE operator in SQL. You would have use two expressions and OR them together:
select * from table
where column like '%-%-%'
or column like '%/%/%'
Thanks for the information. I ended up switching to the GLOB operator which support [] in SQLite.
The Example was altered to GLOB '?[/-_]?[/-_]??*' Where * serves as % and ? serves as _ for the GLOB function.
Also thanks to Amadeaus9 for pointing out minimum characters between delimiters so that '//' isn't a valid answer.
If you're using T-SQL (AKA SQL Server) you don't want to have commas in the character set - i.e. LIKE '%[/_-]%[/_-]%'. However, keep in mind that this can match ANYTHING that has, anywhere within it, any two characters from the set.
EDIT: it doesn't looke like SQLite supports that sort of use of its LIKE operator, based on this link.
Relevant quote:
There are two wildcards used in conjunction with the LIKE operator:
The percent sign (%)
The underscore (_)
However, you may want to take a look at this question, which details using regex in SQLite.
It is not possible using the LIKE syntax.
However Sqlite3 would support the REGEXP operator; this is syntactic sugar for calling an user defined function that actually does the matching. If provided by your platform, then you could use for example
x REGEXP '.*[/_-].*[/_-].*'

SQLite not using index when using concatenation

I am using the following SQL statement for SQLite:
select * from words where \"word\" like ? || '%' || ? ;
In order to bind parameters to the first and last letters. I have tested this both with and without an index on the column word, and the results are the same. However, when running the queries as
select * from words where \"word\" like 'a%a';
etc. (that is, hardcoding each value instead of using ||, the query is about x10 faster when indexed.
Can someone show me how to use the index and the parameters both?
I found an answer thanks to the sqlite mailing list. It says here (http://sqlite.org/optoverview.html), section 4: "The right-hand side of the LIKE or GLOB must be either a string literal or a parameter bound to a string literal that does not begin with a wildcard character."