Why are sql generators using double parenthesis in where clause?

Why are sql generators using double parenthesis in where clause? - sql

I worked with different kind of auto generated sql statements like MS Access and Firebird sql. When I used some query builders to generate this sql snippets (Access or IBExpert) they often generate more parenthesis than needed.
I don't think about extra parenthesis around some boolean operations, but take for example the following:
select id, name from table as t
where ((t.id = #id))
When I remove them the query works perfectly fine. But why do they get generated that often?

In this case, there is no difference to the query having or not having brackets.
I've seen this kind of thing before: The parser just throws them in because it does no harm but makes the parsing code a lot simpler. When rendering a node in an AST, wrap it in brackets - simple.
Otherwise you may have to backtrack to correctly parenthesise OR conditions for example:
WHERE ((A OR B) AND (C OR D)) // correct
vs
WHERE A OR B AND C OR D // incorrect

Related

how to separate the parameters in the sql query and push it in to array to avoid sql injection

SELECT * FROM table1 WHERE year_month BETWEEN '2021-08' AND '2022-01';
update table2 set note_description = 'test #8:57am', patient_id = '5840', note_updated_by = '10000019', note_update_date = '2022-07-13 09:45:49' where note_id = '639'
now my backend queries can be attacked by sql injection so i want to avoid the sql injection
in the above queries I want to separate the parameters from queries and replace it with special characters so that I can avoid sql injection is there any package or anything to do it.

If you have received the SQL statement with the parameters already concatenated in, then this is the wrong place to fix your issue - there’s no way to safely parse the statement and separate out the parameters from the query.
You should find the place in the code where the parameters are concatenated into the Statement and leveraging Prepared Statements/Parameterized Queries to safely pass/bind the parameters.
If that’s not possible (for example because the code is structured to only pass along the statement) a less desirable alternative is to encode/enquote the parameters before concatenating them in, while ensuring they are all quoted in the statement. How you do that part will depend on the database / language being used.

I've seen one product that does this: pt-query-digest. It's a free tool that parses the MySQL query log, and produces reports of aggregate time spent running each query. To do this, it must establish a query "fingerprint" which allows it to group queries that are the same except for constant values. Like SELECT * FROM mytable WHERE id = 123 has the same fingerprint as SELECT * FROM mytable WHERE id = 456.
This means it must parse the queries and replace each constant value, like a numeric or string literal, with a placeholder ?. In cases of IN() predicates, it replaces the list of values with ?+. Also it reduces whitespace and removes comments.
It's a non-trivial amount of code, about 100 lines of Perl: https://github.com/percona/percona-toolkit/blob/3.x/lib/QueryRewriter.pm#L139-L248
In spite of this, the function is preceded by a comment that the developers acknowledge it is not perfect, and may miss some cases. Implementing a recursive-descent parser using regular expressions is not efficient or correct.
But this is probably not what you want to do anyway. You shouldn't be starting from a query with constant values and making them into a parameterized query. You should design parameterized queries yourself, as needed.
Not every constant value in an SQL query necessarily must be parameterized. Only the ones that aren't fixed values. That is, if you need to combine a variable from your client code into the SQL query string, and you can't guarantee that the variable is safe, then use a parameter. If a query has a constant value that is fixed (not interpolated from a variable), then it can remain in the query. If a query has a value that comes from a variable, but that variable is known to be safe, and never can be tainted by untrusted input, then it can remain in the query.
It's more reliable and economical for you to make these judgments. You know the code and the context much better than any automated system can.

Dynamic SQL queries with F# 3.0?

I have tried to use FLINQ but it is rather out of date with F# 3.0 beta.
Can someone give me some pointers on how to create dynamic SQL queries in F#?

We have recently developed a library, FSharpComposableQuery, aimed at supporting more flexible composition of query expressions in F# 3.0 and above. It's intended as a drop-in replacement overloading the standard query builder.
Tomas's example can be modified as follows:
open FSharpComposableQuery
// Initial query that simply selects products
let q1 =
<# query { for p in ctx.Products do
select p } #>
// Create a new query that specifies only expensive products
let q2 =
query { for p in %q1 do
where (p.UnitPrice.Value > 100.0M) }
This simply quotes the query expression and splices it into the second query. However, this results in a quoted query expression that the default QueryBuilder may not be able to turn into a single query, because q2 evaluates to the (equivalent) expression
query { for p in (query { for p in ctx.Products do
select p }) do
where (p.UnitPrice.Value > 100.0M) }
which (as in Tomas's original code) will likely be evaluated by loading all of the products into memory, and doing the selection in memory, whereas what we really want is something like:
query { for p in ctx.Products do
where (p.UnitPrice.Value > 100.0M) }
which will turn into an SQL selection query. FSharpComposableQuery overrides the QueryBuilder to perform this, among other, transformations. So, queries can be composed using quotation and antiquotation more freely.
The project home page is here: http://fsprojects.github.io/FSharp.Linq.ComposableQuery/
and there is some more discussion in an answer I just provided to another (old) question about dynamic queries: How do you compose query expressions in F#?
Comments or questions (especially if something breaks or something that you think should work doesn't) are very welcome.
[EDIT: Updated the links to the project pages, which have just been changed to remove the word "Experimental".]

In F# 3.0, the query is quoted automatically and so you cannot use quotation splicing (the <# foo %bar #> syntax) that makes composing queries possible. Most of the things that you could write by composing queries using splicing can still be done in the "usual LINQ way" by creating a new query from the previous source and adding i.e. filtering:
// Initial query that simply selects products
let q1 =
query { for p in ctx.Products do
select p }
// Create a new query that specifies only expensive products
let q2 =
query { for p in q1 do
where (p.UnitPrice.Value > 100.0M) }
This way, you can dynamically add conditions, dynamically specify projection (using select) and do a couple of other query compositions. However, you don't get the full flexibility of composing queries as with explicit quotations. I guess this is the price that F# 3.0 has to pay for a simpler syntax similar to what exists in C#.
In principle, you should be able to write query explicitly using the query.Select (etc.) operators. This would be written using explicit quotations and so you should be able to use splicing. However, I don't exactly know how the translation works, so I can't give you a working sample. Something like this should work (but the syntax is very ugly, so it is probably better to just use strings or some other techniques):
<# query.Select(Linq.QuerySource<_, _>(ctx.Products), fun prod ->
// You could use splicing here, for example, if 'projection' is
// a quotation that specifies the projection, you could write:
// %projection
prod.ProductName) #>
|> query.Run
The queries in F# 3.0 are based on IQueryable, so it might be possible to use the same trick as the one that I implemented for C#. However, I guess that some details would be different, so I wouldn't expect that to work straight away. The best implementation of that idea is in LINQKit, but I think it won't directly work in F#.
So, in general, I think the only case that works well is the first example - where you just apply additional query operators to the query by writing multiple queries.

Expression Too Complex In Access 2007

When I try to run this query in Access through the ODBC interface into a MySQL database I get an "Expression too complex in query expression" error. The essential thing I'm trying to do is translate abbreviated names of languages into their full body English counterparts. I was curious if there was some way to "trick" access into thinking the expression is smaller with sub queries, or if someone else had a better idea of how to solve this problem. I thought about making a temporary table and doing a join on it, but that's not supported in Access SQL.
Just as an FYI, the query worked fine until I added the big long IFF chain. I tested the query on a smaller IFF chain for three languages, and that wasn't an issue, so the problem definitely stems from the huge IFF chain (It's 26 deep). Also, I might be able to drop some of the options (like combining the different forms of Chinese or Portuguese)
As a test, I was able to get the SQL query to work after paring it down to 14 IFF() statements, but that's a far cry from the 26 languages I'd like to represent.
SELECT TOP 5 Count( * ) AS [Number of visits by language], IIf(login.lang="ar","Arabic",IIf(login.lang="bg","Bulgarian",IIf(login.lang="zh_CN","Chinese (Simplified Han)",IIf(login.lang="zh_TW","Chinese (Traditional Han)",IIf(login.lang="cs","Czech",IIf(login.lang="da","Danish",IIf(login.lang="de","German",IIf(login.lang="en_US","United States English",IIf(login.lang="en_GB","British English",IIf(login.lang="es","Spanish",IIf(login.lang="fr","French",IIf(login.lang="el","Greek",IIf(login.lang="it","Italian",IIf(login.lang="ko","Korean",IIf(login.lang="hu","Hungarian",IIf(login.lang="nl","Dutch",IIf(login.lang="pl","Polish",IIf(login.lang="pt_PT","European Portuguese",IIf(login.lang="pt_BR","Brazilian Portuguese",IIf(login.lang="ru","Russian",IIf(login.lang="sk","Slovak",IIf(login.lang="sl","Slovenian","IIf(login.lang="fi","Finnish",IIf(login.lang="sv","Swedish",IIf(login.lang="tr","Turkish","Unknown")))))))))))))))))))))))))) AS [Language]
FROM login, reservations, reservation_users, schedules
WHERE (reservations.start_date Between DATEDIFF('s','1970-01-01 00:00:00',[Starting Date in the Following Format YYYY/MM/DD]) And DATEDIFF('s','1970-01-01 00:00:00',[Ending Date in the Following Format YYYY/MM/DD])) And reservations.is_blackout=0 And reservation_users.memberid=login.memberid And reservation_users.resid=reservations.resid And reservation_users.invited=0 And reservations.scheduleid=schedules.scheduleid And scheduletitle=[Schedule Title]
GROUP BY login.lang
ORDER BY Count( * ) DESC;
# Michael Todd
I completely agree. The list of languages should have been a table in the database and the login.lang should have been a FK into that table. Unfortunately this isn't how the database was written, and it's not really mine to modify. The languages are placed into the login.lang field by the PHP running on top of the database.

I thought about making a temporary table and doing a join on it, but that's not supported in Access SQL.
Did you try making a table of languages within Access, and joining it to the MySQL tables?

You may try the below expression. what I did is, your expression is cut down to two parts, then a final 'IIf' check will do the trick. You will have additional 2 fields and you may ignore those. I had the same situation and this worked well for me. PS: You may need to double check the closing brackets in the below expression. I did it quickly.
Thanks,
Shibin
IIf(login.lang="ar","Arabic",IIf(login.lang="bg","Bulgarian",IIf(login.lang="zh_CN","Chinese (Simplified Han)",IIf(login.lang="zh_TW","Chinese (Traditional Han)",IIf(login.lang="cs","Czech",IIf(login.lang="da","Danish",IIf(login.lang="de","German",IIf(login.lang="en_US","United States English",IIf(login.lang="en_GB","British English",IIf(login.lang="es","Spanish",IIf(login.lang="fr","French",IIf(login.lang="el","Greek",IIf(login.lang="it","Italian",""))))))))))))) as l1,
IIf(login.lang="ko","Korean",IIf(login.lang="hu","Hungarian",IIf(login.lang="nl","Dutch",IIf(login.lang="pl","Polish",IIf(login.lang="pt_PT","European Portuguese",IIf(login.lang="pt_BR","Brazilian Portuguese",IIf(login.lang="ru","Russian",IIf(login.lang="sk","Slovak",IIf(login.lang="sl","Slovenian","IIf(login.lang="fi","Finnish",IIf(login.lang="sv","Swedish",IIf(login.lang="tr","Turkish","Unknown")))))))))))) as l2,
IIf(l1="",l2,l1) AS [Language]

If you can't use a lookup table, create a custom VB function, so that instead of 26 IIf statements, you have one function call.

Access is re-writing - and breaking - my query!

I have a query in MS Access (2003) that makes use of a subquery. The subquery part looks like this:
...FROM (SELECT id, dt, details FROM all_recs WHERE def_cd="ABC-00123") AS q1,...
And when I switch to Table View to verify the results, all is OK.
Then, I wanted the result of this query to be printed on the page header for a report (the query returns a single row that is page-header stuff). I get an error because the query is suddenly re-written as:
...FROM [SELECT id, dt, details FROM all_recs WHERE def_cd="ABC-00123"; ] AS q1,...
So it's Ok that the round brackets are automatically replaced by square brackets, Access feels it needs to do that, fine! But why is it adding the ; into the subquery, which causes it to fail?
I suppose I could just create new query objects for these subqueries, but it seems a little silly that I should have to do that.

Ah, the joys of Access. The query designer in general does not play well with derived tables. There are more than a few constructs in fact, that Jet will honor that cannot be viewed properly in the query designer. In fact, the QBE will mangle (alter as you have seen) many of these complex queries. In general, you should simply assume that you cannot safely view the design of a derived table or "complex" query in the QBE but instead only in code.

If you want to use the more standard derived-table syntax, you need to switch to SQL 92 mode. However, beware that this also changes your wildcards to SQL-Server be compatible (% and _ instead of * and ?).
As #HansUp points out, the error in your SQL is not the ";" but the lack of the trailing period after the closing square bracket. That syntax has been part of Jet for as long as I've been using derived tables (which would be back to A97 or so). It has the flaw of preventing any expressions inside the derived-table SQL that require square brackets (such as field names with spaces in them), but I don't think that's a terrible flaw as I avoid naming things in ways that require square brackets.
EDIT:
Also note that SQL 92 mode has other problems, outlined in the edit at the end of this post of mine.

I have seen what you described where Access replaces subquery parentheses with square brackets. However I have never noticed it adding in a semicolon after the subquery.
Another detail is that, with square backets, your query will follow this pattern:
... FROM [ SELECT whatever FROM someTable ]. AS q ...
Notice the dot immediately after the closing square bracket. Your sample didn't include a dot. So I wonder what might happen if you add the dot and remove the semicolon (in SQL View) like this:
...FROM [SELECT id, dt, details FROM all_recs WHERE def_cd="ABC-00123" ]. AS q1,...
Does Access accept that change, and is that preserved when you make any further changes through the Query Designer?

Regular expression to match common SQL syntax?

I was writing some Unit tests last week for a piece of code that generated some SQL statements.
I was trying to figure out a regex to match SELECT, INSERT and UPDATE syntax so I could verify that my methods were generating valid SQL, and after 3-4 hours of searching and messing around with various regex editors I gave up.
I managed to get partial matches but because a section in quotes can contain any characters it quickly expands to match the whole statement.
Any help would be appreciated, I'm not very good with regular expressions but I'd like to learn more about them.
By the way it's C# RegEx that I'm after.
Clarification
I don't want to need access to a database as this is part of a Unit test and I don't wan't to have to maintain a database to test my code. which may live longer than the project.

Regular expressions can match languages only a finite state automaton can parse, which is very limited, whereas SQL is a syntax. It can be demonstrated you can't validate SQL with a regex. So, you can stop trying.

SQL is a type-2 grammar, it is too powerful to be described by regular expressions. It's the same as if you decided to generate C# code and then validate it without invoking a compiler. Database engine in general is too complex to be easily stubbed.
That said, you may try ANTLR's SQL grammars.

As far as I know this is beyond regex and your getting close to the dark arts of BnF and compilers.
http://savage.net.au/SQL/
Same things happens to people who want to do correct syntax highlighting. You start cramming things into regex and then you end up writing a compiler...

I had the same problem - an approach that would work for all the more standard sql statements would be to spin up an in-memory Sqlite database and issue the query against it, if you get back a "table does not exist" error, then your query parsed properly.

Off the top of my head: Couldn't you pass the generated SQL to a database and use EXPLAIN on them and catch any exceptions which would indicate poorly formed SQL?

Have you tried the lazy selectors. Rather than match as much as possible, they match as little as possible which is probably what you need for quotes.

To validate the queries, just run them with SET NOEXEC ON, that is how Entreprise Manager does it when you parse a query without executing it.
Besides if you are using regex to validate sql queries, you can be almost certain that you will miss some corner cases, or that the query is not valid from other reasons, even if it's syntactically correct.

I suggest creating a database with the same schema, possibly using an embedded sql engine, and passing the sql to that.

I don't think that you even need to have the schema created to be able to validate the statement, because the system will not try to resolve object_name etc until it has successfully parsed the statement.
With Oracle as an example, you would certainly get an error if you did:
select * from non_existant_table;
In this case, "ORA-00942: table or view does not exist".
However if you execute:
select * frm non_existant_table;
Then you'll get a syntax error, "ORA-00923: FROM keyword not found where expected".
It ought to be possible to classify errors into syntax parsing errors that indicate incorrect syntax and errors relating to tables name and permissions etc..
Add to that the problem of different RDBMSs and even different versions allowing different syntaxes and I think you really have to go to the db engine for this task.

There are ANTLR grammars to parse SQL. It's really a better idea to use an in memory database or a very lightweight database such as sqlite. It seems wasteful to me to test whether the SQL is valid from a parsing standpoint, and much more useful to check the table and column names and the specifics of your query.

The best way is to validate the parameters used to create the query, rather than the query itself. A function that receives the variables can check the length of the strings, valid numbers, valid emails or whatever. You can use regular expressions to do this validations.

public bool IsValid(string sql)
{
string pattern = #"SELECT\s.*FROM\s.*WHERE\s.*";
Regex rgx = new Regex(pattern, RegexOptions.IgnoreCase);
return rgx.IsMatch(sql);
}

I am assuming you did something like .\* try instead [^"]* that will keep you from eating the whole line. It still will give false positives on cases where you have \ inside your strings.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas