Below is simplified example of my data. As you can see – there are just two rows here
So I run below and suddenly getting unexpected result
What I expected was something like:
Why am I getting wrong result?
Moreover, when I run below – I am getting only one row. Why second row with id=1 is not showing??
Is there BigQuery bug or what?
Disclaimer: I was asked exactly this type of question few times offline (outside of StackOverflow) and recently saw very same question on SO (I can't understand this BigQuery magic. find string with LIKE) but unfortunately it was deleted so I decided to Post this on my own
The reason for GROUP BY not grouping those two rows is that str field in those rows are actually different. Unfortunately, BigQuery Web UI collapses spaces in result panel when it is in Table mode. To see real/original values you can switch to JSON mode, as below
Same reason is for unexpected result for use of LIKE
As of how to deal with this? It depends!
For example you can kind of normalize your strings by suppressing spaces by yourself as it is shown below
P.S. In our internal tools – we just fixed the issue with suppressed spaces and just simply show all spaces:
I need case insensitivity in my queries so I found IGNORE CASE which works superbly when used in queries that target the browser (I am talking about BQ web UI). If I choose a destination table (an absolute must for me) and select Allow Large Results (with unchecked Flatten Results) then I get a cryptic error like this:
Error: unexpected LIMIT clause at: 2.200 - 2.206
Even though this Official Google BigQuery issue and feature request tracker post seems to speak of the same issue and even though the problem seems to have been acknowledged back in Jan 2015 the solution isn't apparent.
I could potentially use a bunch of temp tables with lowercased search columns as a workaround but that sounds awfully difficult with the number of tables and columns that I have and the complex queries that I intend to run.
Any other possible workarounds? Why isn't this working yet on BQ?
Yes, it is a known problem, and it has not been neglected. The code changes to fix it are (surprisingly) not trivial, but they are mostly done. Not team is carefully looking how to enable and deploy them. I cannot give you a timeline, but the fix to this problem is coming.
The only workarounds in the meantime, are to wrap all the string comparisons, string GROUP BYs and string ORDER BYs with conversion to LOWER() (or UPPER()) of operands.
I have a large SQL file where the identifiers are not quoted. IntelliJ IDEA suggest the intention to "Quote identifier". I can do this one by one but this is very cumbersome since I have a lot of different identifiers there (database names, table names, fields).
Using the Analyze/Run inspection by name command, I can select the "Identifier should be quoted" analysis to my whole file. But this analysis does not yield any result for some unknown reason. The result of the analysis is "No suspicious code found".
I also defined an exact scope of the files I want to apply the intention on, but it didn't help - same result.
How can I achieve the application of the intention multiple times at once?
I don't know how to apply intentions actions in batch (I don't even know if this is possible), but for your problem you can do:
Open Settings > Editor > SQL. In the general tab, find the option Identifier quotation and set it to Quote. Click apply.
Open your file and reformat it (Code > Reformat Code or Ctrl+Alt+L).
It should normally add quotes to all the identifiers in the file.
I have inherited a really bad access database that I need to move the data out of it into a MySQL database. I have a field that has some string data in it followed by an oddly formatted date in parentheses at the end. I am trying to craft a query that has a field that contains only the string up to the open parenthese "(". And a second field that contains only the contents of the parentheses.
Following advice I have found here and elsewhere I have tried
note: Left(notefield, InStr(notefield, "("))
but I get the error "Undefined function 'left' in expression." even though I built it using the Builder. So any ideas what I should use in my access query to extract this data? And it has to be in access.
VB(A) is funny this way (I say it that way because this same problem shows up in VB6 as well as the VBA in Office applications)....
When built-in functions such as Left, 'right$, InStr, &c. start throwing "Undefined Function" errors, it almost always means that you've got a problem with References. Some library that you have defined a reference to is missing or broken, and it's not necessarily the one that's reporting errors.
Check Tools | References... and make sure that nothing that's checked says it's "MISSING". If it is, either remove it or fix the link (you can Browse... to the .dll file if you know where it's stored).
I was writing some Unit tests last week for a piece of code that generated some SQL statements.
I was trying to figure out a regex to match SELECT, INSERT and UPDATE syntax so I could verify that my methods were generating valid SQL, and after 3-4 hours of searching and messing around with various regex editors I gave up.
I managed to get partial matches but because a section in quotes can contain any characters it quickly expands to match the whole statement.
Any help would be appreciated, I'm not very good with regular expressions but I'd like to learn more about them.
By the way it's C# RegEx that I'm after.
Clarification
I don't want to need access to a database as this is part of a Unit test and I don't wan't to have to maintain a database to test my code. which may live longer than the project.
Regular expressions can match languages only a finite state automaton can parse, which is very limited, whereas SQL is a syntax. It can be demonstrated you can't validate SQL with a regex. So, you can stop trying.
SQL is a type-2 grammar, it is too powerful to be described by regular expressions. It's the same as if you decided to generate C# code and then validate it without invoking a compiler. Database engine in general is too complex to be easily stubbed.
That said, you may try ANTLR's SQL grammars.
As far as I know this is beyond regex and your getting close to the dark arts of BnF and compilers.
http://savage.net.au/SQL/
Same things happens to people who want to do correct syntax highlighting. You start cramming things into regex and then you end up writing a compiler...
I had the same problem - an approach that would work for all the more standard sql statements would be to spin up an in-memory Sqlite database and issue the query against it, if you get back a "table does not exist" error, then your query parsed properly.
Off the top of my head: Couldn't you pass the generated SQL to a database and use EXPLAIN on them and catch any exceptions which would indicate poorly formed SQL?
Have you tried the lazy selectors. Rather than match as much as possible, they match as little as possible which is probably what you need for quotes.
To validate the queries, just run them with SET NOEXEC ON, that is how Entreprise Manager does it when you parse a query without executing it.
Besides if you are using regex to validate sql queries, you can be almost certain that you will miss some corner cases, or that the query is not valid from other reasons, even if it's syntactically correct.
I suggest creating a database with the same schema, possibly using an embedded sql engine, and passing the sql to that.
I don't think that you even need to have the schema created to be able to validate the statement, because the system will not try to resolve object_name etc until it has successfully parsed the statement.
With Oracle as an example, you would certainly get an error if you did:
select * from non_existant_table;
In this case, "ORA-00942: table or view does not exist".
However if you execute:
select * frm non_existant_table;
Then you'll get a syntax error, "ORA-00923: FROM keyword not found where expected".
It ought to be possible to classify errors into syntax parsing errors that indicate incorrect syntax and errors relating to tables name and permissions etc..
Add to that the problem of different RDBMSs and even different versions allowing different syntaxes and I think you really have to go to the db engine for this task.
There are ANTLR grammars to parse SQL. It's really a better idea to use an in memory database or a very lightweight database such as sqlite. It seems wasteful to me to test whether the SQL is valid from a parsing standpoint, and much more useful to check the table and column names and the specifics of your query.
The best way is to validate the parameters used to create the query, rather than the query itself. A function that receives the variables can check the length of the strings, valid numbers, valid emails or whatever. You can use regular expressions to do this validations.
public bool IsValid(string sql)
{
string pattern = #"SELECT\s.*FROM\s.*WHERE\s.*";
Regex rgx = new Regex(pattern, RegexOptions.IgnoreCase);
return rgx.IsMatch(sql);
}
I am assuming you did something like .\* try instead [^"]* that will keep you from eating the whole line. It still will give false positives on cases where you have \ inside your strings.