Using prepared statements with PDO, I understand it as there's two paths,
either ? or :name.
What are the limitations regarding the named parameters? White spaces? Non ASCII-chars?
(I'm well acquainted with the hell of non-ASCII in field names. So please stick to the topic.)
Those are tokens. Limits are probably A-Z, 0-9 characters, not starting with 0-9.
From that switch starting on the line of 304 I would say it is [a-zA-Z0-9_]
Related
Should I avoid special characters like "é á ç" in SQL table names and column names?
What are the pros and cons of using special characters?
As you can guess, there are pros and cons. This is more or less a subjective question.
SQL (unlike most programming languages) allows you to use special characters, whitespace, punctuation, or reserved words in your table or column identifiers.
It's pretty nice that people have the choice to use appropriate characters for their native language.
Especially in cases where a word changes its meaning significantly when spelled with the closest ASCII characters: e.g. año vs. ano.
But the downside is that if you do this, you have to use "delimited identifiers" every time you reference the table with special characters. In standard SQL, delimited identifiers use double-quotes.
SELECT * FROM "SELECT"
This is actually okay! If you want to use an SQL reserved word as a table name, you can do it. But it might cause some confusion for some readers of the code.
Likewise if you use special non-ASCII characters, it might make it hard for English-speaking programmers to maintain the code, because they are not familiar with the key sequence to type those special characters. Or they might forget that they have to delimit the table names.
SELECT * FROM "año"
Then there's non-standard delimited identifiers. Microsoft uses square-brackets by default:
SELECT * FROM [año]
And MySQL uses back-ticks by default:
SELECT * FROM `año`
Though both can use the standard double-quotes as identifier delimiters if you enable certain options, you can't always rely on that, and if the option gets disabled, your code will stop working. So users of Microsoft and MySQL are kind of stuck using the non-standard delimiters, unfortunately.
Maintaining the code is simpler in some ways if you can stick with ASCII characters. But there are legitimate reasons to want to use special characters too.
Lately I have been doing a security pass on a PHP application and I've already found and fixed one XSS vulnerability (both in validating input and encoding the output).
How can I query the database to make sure there isn't any malicious data still residing in it? The fields in question should be text with allowable symbols (-, #, spaces) but shouldn't have any special html characters (<, ", ', >, etc).
I assume I should use regular expressions in the query; does anyone have prebuilt regexes especially for this purpose?
If you only care about non-alphanumerics and it's SQL Server you can use:
SELECT *
FROM MyTable
WHERE MyField LIKE '%[^a-z0-9]%'
This will show you any row where MyField has anything except a-z and 0-9.
EDIT:
Updated pattern would be: LIKE '%[^a-z0-9!-# ]%' ESCAPE '!'
I had to add the ESCAPE char since you want to allow dashes -.
For the same reason that you shouldn't be validating input against a black-list (i.e. list of illegal characters), I'd try to avoid doing the same in your search. I'm commenting without knowing the intent of the fields holding the data (i.e. name, address, "about me", etc.), but my suggestion would be to construct your query to identify what you do want in your database then identify the exceptions.
Reason being there are just simply so many different character patterns used in XSS. Take a look at the XSS Cheat Sheet and you'll start to get an idea. Particularly when you get into character encoding, just looking for things like angle brackets and quotes is not going to get you too far.
I am able to store values in couchdb-lucene with whatever key I like, but it seems that if the key includes any chars outside of [0-9a-zA-Z_] any search fails.
Does anyone know what chars are valid and/or how to properly escape special chars in searches such that special chars can be used?
This shows how to escape special characters and also gives a list of such characters.
All UTF-8 characters should work. I've just verified that I can search for items with é, for example.
A little more information on how you're querying would help, though given the age of this ticket perhaps you've moved on.
I have noticed that using either Oracle or SQLite, queries like this are perfectly valid
SELECT*FROM(SELECT a,MAX(b)i FROM c GROUP BY a)WHERE(a=1)OR(i=2);
Is that a “feature” of SQL that keywords or words of a query need not be surrounded with whitespace? If so, why was it designed this way? SQL has been designed to be readable, this seems to be a form of obfuscation (particularly the MAX(b)i thing where i is a token which serves as an alias).
SQL-92 BNF Grammar here explicitly states that delimiters (bracket, whitespace, * etc) are valid to break up the tokens, which makes the white space optional in various cases where other delimiters already break up the tokens.
This is true not only for SQLite and Oracle, but MySQL and SQL Server at least (that I work with and have tested), since it is specified in the language definition.
Whitespace is optional in pretty much any language where it is not absolutely necessary to preserve boundaries between keywords and/or identifiers. You could write code in C# that looked similar to your SQL, and as long as the compiler can still parse the identifiers and keywords, it doesn't care.
Case in point: The subquery of your statement is the only place where whitespace is needed to separate keywords from other alpha characters. Everywhere else, some non-alphanumeric character (which aren't part of any keyword in SQL) separates keywords, so the SQL parser can still digest this statement. As long as that is true, whitespace is purely for human readability.
Most of this is valid simply because you've enclosed key sections in parentheses where white space would ordinarily be required.
I think this is a side effect of the parser.
Usually the compilers will ignore white spaces via blocks SKIP, which are tokens ignored by the compiler but that cause errors if in the middle of a reserved word. For example in C: 'while' is valid, 'whi le' is not although the whitespace is a SKIP token.
The reason is that it simplifies the parser, if not they would have to manage all the white space and that can be quite complex unless you set strict rules like Python does, but that would be both hard to impose to vendors like Oracle and would make SQL more complex than it should.
And that simplification has the (unintended?) side effect of being able to remove MOST (not all) white spaces. Be aware in some cases the removal of white spaces may cause compilation errors (can't remove the space in GROUP BY as that's part of the token).
We're currently replacing all special characters and spaces in our URLs with hypens (-). From an SEO and readability point-of-view this works fine. However, in some cases, we are feeding parts of the URL into a search after stripping the hyphens out. The problem occurs when the search term should have hyphens as it returns no results when they get stripped. We could modify the search algorithm we're using but this will slow it down (especially bad as we're using it with an AJAX-ed search box and this needs to be fast).
The best option to deal with this, as far as we can tell is to replace pre-existing hyphens with pipes (|). I have a feeling that this will have a negative impact on SEO for those terms as the pipe character will be treated as a part of the word and not as a separator. As far as I can tell, the only characters that are considered to be separators are hyphens and forward slashes (/).
So my questions are:
Are there alternative characters we can use to represent hyphens?
If we can't use any other characters, how much impact will using a pipe character have on a search engine?
Cheers,
Zac
Would ~ (tilde) work?
Edit: Google now treats underscores and dashes as word separators so you can use dashes as dashes and underscores as spaces.
Why not use Url Encoding? Most frameworks have built in utilities to do this.
I was going to say the same thing about URL encoding, but if you're trying to get rid of the special characters, I suppose you don't want URLs with percent signs, right?
What about altering the algorithm that "feeds parts of the URL into a search"? Couldn't you add some logic to not replace hyphens within the search query part of the URL?