I am writing a backend application which needs to be able to send multiple SQL commands to a MySQL server.
MySQL >= 5.x support multiple statements, but unfortunately we are interfacing with MySQL 4.x.
I am trying to find a way (hint: regex) to split SQL statements by their semicolon, but it should ignore semicolons in single and double quotes strings.
http://www.dev-explorer.com/articles/multiple-mysql-queries has a very nice regex to do that, but doesn't support double quotes.
I'd be happy to hear your suggestions.
Can't be done with regex, it's insufficiently powerful to parse SQL. There may be an SQL parser available for your language — which is it? — but parsing SQL is quite hard, especially given the range of different syntaxes available. Even in MySQL alone there are many SQL_MODE flags on a server and connection level that can affect how basic strings and comments are parsed, making statements behave quite differently.
The example at dev-explorer goes to amusing lengths to try to cope with escaped apostrophes and trailing strings, but will still fail for many valid combinations of them, not to mention the double quotes, backticks, the various comment syntaxes, or ANSI SQL_MODE.
As bobince said, regular expressions are probably not going to be powerful enough to do this. They're certainly not going to be powerful enough to do it in any halfway elegant manner. The second link cdonner provided also does not address this; most answers there were trying to talk the questioner out of doing this without semicolons; if he had taken the general advice, then he'd have ended up where you are.
I think the quickest path to solving this is going to be with a string scanner function, that examines every character of the string in sequence, and reacts based on a bit of stored state. Rough pseudocode:
Read in a character
If the character is not special, CONTINUE
If the character is escaped (checking this probably requires examining the previous character), CONTINUE
If the character would start a new string or end an existing one, toggle a flag IN_STRING (you might need multiple flags for different string types... I've honestly tried and succeeded at remaining ignorant of the minutiae of SQL quoting/escaping) and CONTINUE
If the character is a semicolon AND we are not currently in a string, we have found a query! OUTPUT it and CONTINUE scanning until the end of the string.
Language parsing is not any of my areas of experience, so you'll want to consider that approach carefully; nonetheless, it's going to be fast (with C-style strings, none of those steps are at all expensive, save possibly for the OUTPUT, depending on what "outputting" means in your context) and I think it should get the job done.
maybe with the following Java Regexp? check the test...
#Test
public void testRegexp() {
String s = //
"SELECT 'hello;world' \n" + //
"FROM DUAL; \n" + //
"\n" + //
"SELECT 'hello;world' \n" + //
"FROM DUAL; \n" + //
"\n";
String regexp = "([^;]*?('.*?')?)*?;\\s*";
assertEquals("<statement><statement>", s.replaceAll(regexp, "<statement>"));
}
I would suggest seeing if you can redefine the problem space so the need to send multiple queries separated only by their terminator is not required.
Try this. Just replaced the 1st ' with \" and it seems to work for both ' and "
;+(?=([^\"|^\\']['|\\'][^'|^\\']['|\\'])[^'|^\\'][^'|^\\']$)
Related
I have some automated workflow, which includes updating a column via SQL with HTML tags in it.
The basic SQL statement goes like this:
UPDATE content SET bodytext = '<div class="one two three">Here comes a whole lot of HTML with all special chars and double quotes " and single quotes ' and empty lines and all possible kind of stuff...</div>' WHERE pid = 10;
Is there a way to make MariaDB or MySQL to escape things automatically in SQL (without PHP)?
I'd suggest to use prepared statements. This way you separate the statement from it's parameters and don't need to care about additional escaping necessary in plain SQL.
Using functionality provided in PHP's MySQLi driver would simplify the process:
https://www.w3schools.com/php/php_mysql_prepared_statements.asp
Prepared statements are also possible in plain SQL, but I'm not sure if doing it manually would be worth the hassle
https://dev.mysql.com/doc/refman/8.0/en/sql-prepared-statements.html
Thank you for your input, but I think, I found a solution which works for me. It seems that you actually can tell the SQL server to accept a raw string by this kind of syntax:
SELECT q'[The 'end' of the day]'
(Source: https://www.databasestar.com/sql-escape-single-quote/)
So I did the following:
SELECT #html := '[<div class="one two three">Here comes a whole lot of HTML with all special chars and double quotes " and single quotes '' and empty lines and all possible kind of stuff...</div>]';
UPDATE content SET bodytext = #html WHERE pid = 10;
And it works that way without any escaping problems.
I have a small issue, so here's a bit of background:
We are developing a Qlik Sense application and we normally write our expressions to an external script. We save these as variables, and then evaluate the variables in the application. The advantage of this is a) we can use better version control with GIT, and b) we can separate the queries from the application if we ever need to change platforms in future.
My Problem:
I have come across a situation where we need to concat a string to the result of an expression, which can be done easily in the application, but when you save the expression to an external file the single quotes around the expression interfere with the single quotes around the string.
I tried
using double quotes for the string only, but qlik doesn't evaluate it correctly.
same goes for the expression using double quotes only.
escaping the single quote inside the expression, eg. "\'" but same story.
What I was thinking of doing next was changing the quote to a rogue character so qlik would ignore it as text, then replacing it with a quote later so qlik would then try to evaluate it.
Example Code:
SET variable = 'if(isnull(month),'Month: ' & date(now(), 'MMM-YYYY'),'Month: ' & only({$<year={2016}, month={6}>}month)';
After some further research I found that Qlik has its own way of escaping characters without using the "\" character. I was able to solve this issue by escaping the inner single quotes like this:
SET variable = 'if(isnull(month),''Month: '' & date(now(), ''MMM-YYYY''),''Month: '' & only({$<year={2016}, month={6}>}month)';
Feels like a pretty silly oversight now, but hopefully this will save someone some time in the future.
I am using VBA in Ms Access environment, to handle long string (memo field storing HTML originally).
After positioning by Instr(), I put the position into Mid(vStr,vStartPos,vEndPos-vStartPos+1) to extract the string, but the output doesn't match. I have already carefully checked this in immediate windows, as well as NotePad++. What I can say is Instr() and NotePad++ have given the same counting result, while Mid() is different. Mid()'s result are former than Instr()'s in some cases, and latter in other cases. I don't know the reason, and can just believe Mid() use different mechanism or have defeative (surprised!) in handling long string mixed with single-byte and bi-byte chars (but this is common in the world, and meet no problem before), and possibly some special characters.
I believe I need to custom-make a Mid() function. Any idea how to do it effectively and efficiently?
Thanks all for your reply. After I created a custom Mid() by RegEx and find that the problem has no change, I have found out the silly mistake I made. The Instr() and Mid() have no problem, but the string has been carelessly modified between them. So this case should be closed now.
I would like a regex that would make this:
VALUES('Hit 'n Run')
into
VALUES('Hit ''n Run')
Is this possible?
No, this is not really possible. If you have VALUES('Hit 'n Run'), you already have an invalid mixture of delimiting apostrophes and literal apostrophes. String processing is like mixing sugar and salt: once you've mixed contexts without proper escaping there is no way of pulling them back apart.
If you are trying to rescue broken data, you could try something like (?<!\()'(?!\)) to match apostrophes that don't have a bracket next to them. It's a weak and easily fooled tactic but for simple data it might work.
If you are putting together dynamic SQL queries you must escape the ' before you put it into the query string, either using a simple string replace ' with '' if you're sure that's the only escape your DBMS requires, or — much better — using a dedicated SQL-string-literal-escaping function appropriate to your DBMS. Quite what that function would be depends on what platform (language, DBMS) you're talking about.
Any pattern that could be expressed in RegEx could then be exploited to create the very SQL injection issues you're trying to avoid.
Example nasty input:
VALUES(');DELETE * FROM customer;SELECT '
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
String vs StringBuilder
I just revisited some of the books that I used to pick up VB.NET. I am not sure I've got this in my head, understand how/what StringBuilder is.
What is the guidance for using? Is it best to use it if you are are concatenating 2 strings or 50?
Or when the the total string length is greater than 128 characters?
Or will you see a performance benefit whenever you use it to add strings together?
In which case is it better to use a StringBuilder instance to build a SQL statement than string.format("Select * from x where y = {0}",1)?
It's always struck me that declaring another variable and including a name space is not beneficial for small string concatenations, but I am not sure now.
Sorry, lot of documentation tells you what to use, just not what's best.
I've got an article on this very topic. In summary (copied from the bottom of the page):
Definitely use StringBuilder when you're concatenating in a non-trivial loop - especially if you don't know for sure (at compile time) how many iterations you'll make through the loop. For example, reading a file a character at a time, building up a string as you go using the += operator is potentially performance suicide.
Definitely use the concatenation operator when you can (readably) specify everything which needs to be concatenated in one statement. (If you have an array of things to concatenate, consider calling String.Concat explicitly - or String.Join if you need a delimiter.)
Don't be afraid to break literals up into several concatenated bits - the result will be the same. You can aid readability by breaking a long literal into several lines, for instance, with no harm to performance.
If you need the intermediate results of the concatenation for something other than feeding the next iteration of concatenation, StringBuilder isn't going to help you. For instance, if you build up a full name from a first name and a last name, and then add a third piece of information (the nickname, maybe) to the end, you'll only benefit from using StringBuilder if you don't need the (first name + last name) string for other purpose (as we do in the example which creates a Person object).
If you just have a few concatenations to do, and you really want to do them in separate statements, it doesn't really matter which way you go. Which way is more efficient will depend on the number of concatenations the sizes of string involved, and what order they're concatenated in. If you really believe that piece of code to be a performance bottleneck, profile or benchmark it both ways.
Here is my rule of thumb:
StringBuilder is best used when the exact number of concatenations is unknown at compile time.
Coding Horror has a good article concerning this question, The Sad Tragedy of Micro-Optimization Theater.
Personally I use StringBuilder when I have more than just one or two strings to concatenate. I'm not sure if there's a real performance hit to be gained, but I've always read and been told that doing a regular concatenation with multiple strings creates an extra copy of the string each time you do it, while using StringBuilder keeps one copy until you call the final ToString() method on it.
Someone's figured out experimentally that the critical number is 6. More than 6 concatenations in a row and you should use a StringBuilder. Can't remember where I found this.
However, note that if you just write this in a line:
"qwert" + "yuiop" + "asdf" + "gh" + "jkl;" + "zxcv" + "bnm" + ",."
That gets converted into one function call (I don't know how to write it in VB.net)
String.Concat("qwert", "yuiop", "asdf", "gh", "jkl;", "zxcv", "bnm", ",.");
So if you're doing all concatenations on one line, then don't bother with StringBuilder because String.Concat effectively will do all the concatenations in one go. It's only if you're doing them in a loop or successively concatenating.
My rule - when you're adding to a string in a For or Foreach loop, use the StringBuilder.