How do I get a list with all reserved words in SQL::Parser? - sql

with
#!/usr/bin/perl
use warnings;
use strict;
use 5.010;
use SQL::Parser;
my $parser = SQL::Parser->new( 'ANSI', {RaiseError=>1} );
my $word = 'BETWEEN';
my $success = $parser->feature( 'reserved_words', $word );
$success = $success ? '' : 'NOT';
say "$word is $success a reserved word";
I can check if a word is a reserved word.
Is there a function that gives me a list of all reserved words?

SQL::Dialects::ANSI is a simple interface to information about ANSI SQL in INI format. So you get that and parse it... except its not in INI format because it doesn't contain key = value but just key which chokes Config::INI. Alas, this is one of the few INI parsers I could find that will deal with a string.
So you might have to parse it by hand. That's what SQL::Parser does.
Alternatively you can pull the list out of SQL::Parser's guts.
use Data::Dumper;
use SQL::Parser;
my $s = SQL::Parser->new;
print Dumper $s->{opts}{reserved_words};
This is a hack and will eventually fail.
As per my comments above, the list of ANSI SQL reserved words (hey, which version of ANSI SQL?) is not definitive. The database itself may reserve additional words. And different versions may reserve different words. If you can find a way to do whatever it is you're doing that doesn't rely on a list of reserved words, do that.

Michael gave me this page via RT.
How about a method of SQL::Parser which returns all features of a given class (e.g. 'features($)'), analog to feature()? Would that help you? If yes, please open a feature request on CPAN against SQL::Statement.
I wouldn't break working interfaces others rely on without a good reason - missing feature is not a good reason.

If you don't want it programmatically, ignore everything I just said about SQL::Dialects and read the standard. Always check the standard as derivative works may be incomplete, incorrect or working off dialects. Even though there are updated ANSI SQL standards, most databases use some mutation of SQL-92 or elements of SQL:1999. After that they got silly.
I can't find a copy of the SQL:1999 standard, but here's a BNF which looks very thorough.

Related

Why not have operators as both keywords and functions?

I saw this question and it got me wondering.
Ignoring the fact that pretty much all languages have to be backwards compatible, is there any reason we cannot use operators as both keywords and functions, depending on if it's immediately followed by a parenthesis? Would it make the grammar harder?
I'm thinking mostly of python, but also C-like languages.
Perl does something very similar to this, and the results are sometimes surprising. You'll find warnings about this in many Perl texts; for example, this one comes from the standard distributed Perl documentation (man perlfunc):
Any function in the list below may be used either with or without parentheses around its arguments. (The syntax descriptions omit the parentheses.) If you use parentheses, the simple but occasionally surprising rule is this: It looks like a function, therefore it is a function, and precedence doesn't matter. Otherwise it's a list operator or unary operator, and precedence does matter. Whitespace between the function and left parenthesis doesn't count, so sometimes you need to be careful:
print 1+2+4; # Prints 7.
print(1+2) + 4; # Prints 3.
print (1+2)+4; # Also prints 3!
print +(1+2)+4; # Prints 7.
print ((1+2)+4); # Prints 7.
An even more surprising case, which often bites newcomers:
print
(a % 7 == 0 || a % 7 == 1) ? "good" : "bad";
will print 0 or 1.
In short, it depends on your theory of parsing. Many people believe that parsing should be precise and predictable, even when that results in surprising parses (as in the Python example in the linked question, or even more famously, C++'s most vexing parse). Others lean towards Perl's "Do What I Mean" philosophy, even though the result -- as above -- is sometimes rather different from what the programmer actually meant.
C, C++ and Python all tend towards the "precise and predictable" philosophy, and they are unlikely to change now.
Depending on the language, not() is not defined. If not() is not defined in some language, you can not use it. Why not() is not defined in some language? Because creator of that language probably had not need this type of language construction. Because it is better to let things be simpler.

Is there an APL idiom to get a vector of all alphabetical characters?

I know you can get a character vector of all numbers with ∊⍕¨⍳10, but is there a platform independent idiom for getting a vector of all alphabetical characters, aside from manually typing 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'? I know that I can do ⎕AV[(⍳26)+(⎕AV⍳'a')-1] to get all lowercase characters (and uppercase by changing the 'a' to 'A') in Dyalog APL, but I presume the system variable ⎕AV isn't available in other environments.
Not really.
In Dyalog APL, what I generally do is use ⎕A for the uppercase characters and ⎕UCS 96+⍳26 for the lowercase characters. (And ⎕A,⎕UCS 96+⍳26 for the whole alphabet.)
⎕AV is usually present, but its contents are not standard. (For example, NARS2000's ⎕AV is different from Dyalog's.) By the way, in Dyalog ⎕AV is considered deprecated in favour of ⎕UCS. Any APL that implements ⎕UCS will do it the same way, because Unicode is a set standard.
If you want a guaranteed implementation-independent, readable way to define the alphabet I would indeed recommend to just store abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ in your workspace.
However, I would not recommend trying to write implementation-independent APL code to begin with. APL dialects are rather divergent, so this is decidedly nontrivial (if possible at all for complex code), and will be difficult to maintain.
Even though Quad names (⎕xxx) are usually case insensitive, MicroAPL distinguishes between ⎕A and ⎕a, so ⎕a,⎕A gives 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'.
Yes, in the latest versions*, write ⎕A,819⌶⎕A, for ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz.
Try it online!
Documentation.
* Latest builds of 14.1, and all versions of 15.0 and up.

How to quote values for LuaSQL?

LuaSQL, which seems to be the canonical library for most SQL database systems in Lua, doesn't seem to have any facilities for quoting/escaping values in queries. I'm writing an application that uses SQLite as a backend, and I'd love to use an interface like the one specified by Python's DB-API:
c.execute('select * from stocks where symbol=?', t)
but I'd even settle for something even dumber, like:
conn:execute("select * from stocks where symbol=" + luasql.sqlite.quote(t))
Are there any other Lua libraries that support quoting for SQLite? (LuaSQLite3 doesn't seem to.) Or am I missing something about LuaSQL? I'm worried about rolling my own solution (with regexes or something) and getting it wrong. Should I just write a wrapper for sqlite3_snprintf?
I haven't looked at LuaSQL in a while but last time I checked it didn't support it. I use Lua-Sqlite3.
require("sqlite3")
db = sqlite3.open_memory()
db:exec[[ CREATE TABLE tbl( first_name TEXT, last_name TEXT ); ]]
stmt = db:prepare[[ INSERT INTO tbl(first_name, last_name) VALUES(:first_name, :last_name) ]]
stmt:bind({first_name="hawkeye", last_name="pierce"}):exec()
stmt:bind({first_name="henry", last_name="blake"}):exec()
for r in db:rows("SELECT * FROM tbl") do
print(r.first_name,r.last_name)
end
LuaSQLite3 as well an any other low level binding to SQLite offers prepared statements with variable parameters; these use methods to bind values to the statement parameters. Since SQLite does not interpret the binding values, there is simply no possibility of an SQL injection. This is by far the safest (and best performing) approach.
uroc shows an example of using the bind methods with prepared statements.
By the way in Lua SQL there is an undocumented escape function for the sqlite3 driver in conn:escape where conn is a connection variable.
For example with the code
print ("con:escape works. test'test = "..con:escape("test'test"))
the result is:
con:escape works. test'test = test''test
I actually tried that to see what it'd do. Apparently there is also such a function for their postgres driver too. I found this by looking at the tests they had.
Hope this helps.

Split SQL statements

I am writing a backend application which needs to be able to send multiple SQL commands to a MySQL server.
MySQL >= 5.x support multiple statements, but unfortunately we are interfacing with MySQL 4.x.
I am trying to find a way (hint: regex) to split SQL statements by their semicolon, but it should ignore semicolons in single and double quotes strings.
http://www.dev-explorer.com/articles/multiple-mysql-queries has a very nice regex to do that, but doesn't support double quotes.
I'd be happy to hear your suggestions.
Can't be done with regex, it's insufficiently powerful to parse SQL. There may be an SQL parser available for your language — which is it? — but parsing SQL is quite hard, especially given the range of different syntaxes available. Even in MySQL alone there are many SQL_MODE flags on a server and connection level that can affect how basic strings and comments are parsed, making statements behave quite differently.
The example at dev-explorer goes to amusing lengths to try to cope with escaped apostrophes and trailing strings, but will still fail for many valid combinations of them, not to mention the double quotes, backticks, the various comment syntaxes, or ANSI SQL_MODE.
As bobince said, regular expressions are probably not going to be powerful enough to do this. They're certainly not going to be powerful enough to do it in any halfway elegant manner. The second link cdonner provided also does not address this; most answers there were trying to talk the questioner out of doing this without semicolons; if he had taken the general advice, then he'd have ended up where you are.
I think the quickest path to solving this is going to be with a string scanner function, that examines every character of the string in sequence, and reacts based on a bit of stored state. Rough pseudocode:
Read in a character
If the character is not special, CONTINUE
If the character is escaped (checking this probably requires examining the previous character), CONTINUE
If the character would start a new string or end an existing one, toggle a flag IN_STRING (you might need multiple flags for different string types... I've honestly tried and succeeded at remaining ignorant of the minutiae of SQL quoting/escaping) and CONTINUE
If the character is a semicolon AND we are not currently in a string, we have found a query! OUTPUT it and CONTINUE scanning until the end of the string.
Language parsing is not any of my areas of experience, so you'll want to consider that approach carefully; nonetheless, it's going to be fast (with C-style strings, none of those steps are at all expensive, save possibly for the OUTPUT, depending on what "outputting" means in your context) and I think it should get the job done.
maybe with the following Java Regexp? check the test...
#Test
public void testRegexp() {
String s = //
"SELECT 'hello;world' \n" + //
"FROM DUAL; \n" + //
"\n" + //
"SELECT 'hello;world' \n" + //
"FROM DUAL; \n" + //
"\n";
String regexp = "([^;]*?('.*?')?)*?;\\s*";
assertEquals("<statement><statement>", s.replaceAll(regexp, "<statement>"));
}
I would suggest seeing if you can redefine the problem space so the need to send multiple queries separated only by their terminator is not required.
Try this. Just replaced the 1st ' with \" and it seems to work for both ' and "
;+(?=([^\"|^\\']['|\\'][^'|^\\']['|\\'])[^'|^\\'][^'|^\\']$)

How to make Lucene match all words in query?

I am using Lucene to allow a user to search for words in a large number of documents. Lucene seems to default to returning all documents containing any of the words entered.
Is it possible to change this behaviour? I know that '+' can be use to force a term to be included but I would like to make that the default action.
Ideally I would like functionality similar to Google's: '-' to exclude words and "abc xyz" to group words.
Just to clarify
I also thought of inserting '+' into all spaces in the query. I just wanted to avoid detecting grouped terms (brackets, quotes etc) and potentially breaking the query. Is there another approach?
This looks similar to the Lucene Sentence Search question. If you're interested, this is how I answered that question:
String defaultField = ...;
Analyzer analyzer = ...;
QueryParser queryParser = new QueryParser(defaultField, analyzer);
queryParser.setDefaultOperator(QueryParser.Operator.AND);
Query query = queryParser.parse("Searching is fun");
Like Adam said, there's no need to do anything to the query string. QueryParser's setDefaultOperator does exactly what you're asking for.
Why not just preparse the user search input and adjust it to fit your criteria using the Lucene query syntax before passing it on to Lucene. Alternatively, you could just create some help documentation on how to use the standard syntax to create a specific query and let the user decide how the query should be performed.
Lucene has a extensive query language as described here that describes everything you want except for + being the default but that's something you can simple handle by replacing spaces with +. So the only thing you need to do is define the format you want people to enter their search queries in (I would strongly advise to adhere to the default Lucene syntax) and then you can write the transformations from your own syntax to the Lucene syntax.
The behavior is hard-coded in method addClause(List, int, int, Query) of class org.apache.lucene.queryParser.QueryParser, so the only way to change the behavior (other than the workarounds above) is to change that method. The end of the method looks like this:
if (required && !prohibited)
clauses.addElement(new BooleanClause(q, BooleanClause.Occur.MUST));
else if (!required && !prohibited)
clauses.addElement(new BooleanClause(q, BooleanClause.Occur.SHOULD));
else if (!required && prohibited)
clauses.addElement(new BooleanClause(q, BooleanClause.Occur.MUST_NOT));
else
throw new RuntimeException("Clause cannot be both required and prohibited");
Changing "SHOULD" to "MUST" should make clauses (e.g. words) required by default.