Tool to identify problems in catch blocks - variables

I am looking for a tool (may be build time or eclipse plugin) that can help me to identify if I am not logging the Exception trace/message.
We have a legacy application that has try catch block in which a custom error message is logged. The exception is not logged and is not thrown. So, when a problem occurs, there is no stack trace in the log files that would help to debug the issue. An example of this is:
try {
do something....
} catch (Throwable exception) {
Log.log("<<custom message>>");
}
I need a tool like Coverity or Checkstyle that can help me to identify all such occurrences in my code base.
Thanks and Regards

I'd expect you to be able to do a decent job with any tool that can search text using regular expressions (e.g, grep).
The regex would be something like this:
"catch\W*\(.*\)\W*{\W*Log\.log"
where W stands for some whitespace recognizer that picks up blank and newline.
Your pattern is unique enough I'd expect very few false hits, if the programmers
were consistent with the convention you showed.
[EDIT] OP indicates
I am looking for catch blocks where I am NOT doing the following - '+ exception':
try { do something.... }
catch (Throwable exception)
{ Log.log("<<custom message>>" + exception)
We're back to a regular expression as a pretty decent hack. You need hunt for anyplace that doesn't call Log.log("<<....>", or if it does, doesn't have a following "+exception".
This is awkward to code as a regexp without a "not" operator, but possible. Assuming
the catch clause exists (a different regexp test), and the Log.log call exists, this will probably do it:
"catch\W*\(.*\)\W*{\W*Log\.log\(\".*\+^[\+]"
The last check looks to see if "+" is there. Anything matching this doesn't have the "+".
Our Source Code Search Engine (SCSE) uses the lexemes of the language rather than regexes to enable straightforward searches, so it has a slightly unusual query language written in terms of language lexemes. It also allows "negation" on a larger scale; you can subtract hits in two regions, and that's really useful. This means the following query would do the trick:
'catch' '(' I I ')' '{' I - I=Log '.' I=log '(' S '+' I ')'
This finds hits for all "catch" clauses and the start of the block (assuming it starts with "Log", and subtracts away any matches to the logging idiom. Quoted terms are language atoms. I stands for "I(dentifier)"; it can be any identifier (just I) or constrained to a particular regex for the identifier (of which "Log" is a particularly simple regex). S stands for "S(tring)", also allowing constraints which we don't need for this query. This query has two sub-queries, one part before the minus sign that finds "catch" clauses and a prefix of the catch body, and one part after the minus sign that looks for that idiom OP insists he wants. Any overlaps of results of the second subquery with the first cause the overlapped queries to be "subtracted" (the minus sign) from the result. So the final result are "catch clauses that don't start with a logging step".
A more sophisticated check requires finding the catch clauses, and logging clauses, and verifying that the logging clauses do not occur anywhere in the catch block. The SCSE can't do this by itself. More sophisticated engines that parse and build ASTs can be used to determine this. I know of tools that can do, this too, if OP wants further elaboration.

Related

Precedence inside a function call

Using the defined-or operator ( // ) in a function call produces the result I'd expect:
say( 'nan'.Int // 42); # OUTPUT: «42»
However, using the lower-precedence orelse operator instead throws an error:
say( 'nan'.Int orelse 42);
# OUTPUT: «Error: Unable to parse expression in argument list;
# couldn't find final ')'
# (corresponding starter was at line 1)»
What am I missing about how precedence works?
(Or is the error a bug and I'm just overthinking this?)
I'd say, it's a grammar bug, as
say ("nan".Int orelse 42); # 42
works.
TL;DR My super useful naanswer (not-an-answer / non-authoritative answer / food for thought) is it might be a bug or it might not. :)
Other examples:
say(42 and 42);
say(42 ==> 99);
yield the same error.
What am I missing about how precedence works?
Perhaps nothing. Perhaps it will be desirable and possible to fix the grammar so these function-call-arg-list-signifying parens determine precedence just like plain expression parens do.
If so, perhaps fixing it would best wait, or perhaps realistically must wait, until when or after RakuAST lands (6.e?). Or perhaps even later, lf/when grammar cleanup/slangs lands (6.f?).
Or perhaps it's going to always stay as it is for reasons such as good usability (despite the initial "huh?") and/or expediency and/or single-pass parsing and/or whatever.
I've dug a little to see if I could find relevant commentary. Here are some (juicy?) bits:
the OPP is a bit more complex than a standard binary-operator OPP
(from a comment on #perl6)
If you scroll backwards from Larry's comment you'll see he said this in the context of Raku's extraordinary seamless parsing (no delimiters introduced) in a single pass of nested sub-languages that each can have arbitrary grammars.
(Btw, one thought I had: did std parse say(42 and 42) fine? I'm not sure if there's a running std anywhere these days.)
While we do have complete control of stock Raku, I'm not convinced there's anything compelling about bending over backwards to fix every wrinkle of this sort (foo(... op ...) in this case) when the general case (..... where the middle ... inside the outer pair of .s has arbitrary syntax) means we'll be hitting limits in how "perfect" it can all be when there's a huge amount of anarchic language / syntax mixing going on in userland/module space, as I anticipate will emerge in years to come.
So, imo, if it's reasonably easy to fix, without unduly cramping or burdening user slang freedom, great. If not, I think the current situation is fair enough (though perhaps it'll be desirable, viable and reasonable to improve the error message).
Perhaps consider the foregoing in combination with:
Raku borrows many concepts from human language ...
(from the doc)
in combination with:
☞ Self-clocking code produces better syntax error messages
(from Seeing Wrong Right)
in combination with:
Break that clock and your error messages will turn to mush
(from a mailing list comment)
But then again:
Please don't assume that rakudo's idiosyncracies and design fossils are canonical.
Do you mean this, maybe...?
> say ( NaN.Int orelse 42 )
42
since
> say( NaN.Int orelse 42 )
===SORRY!=== Error while compiling:
Unable to parse expression in argument list; couldn't find final ')' (corresponding starter was at line 1)
------> say( '42'.Int⏏ orelse 42 )
expecting any of:
infix
infix stopper
I would tend to agree with #lizmat that there is a grammar bug in the compiler.

Regex/token/rule to match nested curly braces?

I need to match the values of key = value pairs in BibTeX files, which can contain arbitrarily nested braces, delimited by braces. I've got as far as matching at most two deep nested curly braces, like {some {stuff} like {this}} with the kludgey:
token brace-value {
'{' <-[{}]>* ['{' <-[}]>* '}' <-[{}]>* ]* '}'
}
I shudder at the idea of going one level further down... but proper parsing of my BibTeX stuff needs at least three levels deep.
Yes, I know there are BibTeX parsers around, but I need to grab the complete entry for further processing, and peek at a few keys meanwhile. My *.bib files are rather tame (and I wouldn't mind to handle a few stray entries by hand), the problem is that I have a lot of them, with much overlap. But some of the "same" entries have different keys, or extra data. I want to consolidate them into a few master files (the whole idea behind BibTeX, right?). Not fun by hand if bibtool gives a file with no duplicates (ha!) of some 20 thousand lines...
After perusing Lenz' "Parsing with Perl 6 Regexes and Grammars" (Apress, 2017), I realized the "regex" machinery (based on backtracking) might actually be a lot more capable than officially admitted, as a regex can call another, and nowhere do I see a prohibition on recursive calls.
Before digging in, a bit of context free grammars: A way to describing nested braces (and nothing else) is with the grammar:
S -> { S } S | <nothing>
I.e., nested braces are either an opening brace, nested braces, a closing brace, more nested braces; or nothing whatsoever. This translates more or less directly to Raku (there is no empty regex, fake it by making the construction optional):
my regex nb {
[ '{' <nb> '}' <nb> ]?
}
Lo and behold, this works. Need to fix up to avoid captures, kill backtracking (if it doesn't match on the first try, it won't ever match), and decorate with "anything else" fillers.
my regex nested-braces {
:ratchet
<-[{}]>*
[ '{' <.nested-braces> '}' <.nested-braces> ]?
<-[{}]>*
};
This checks out with my test cases.
For not-so-adventurous souls, there is the Text::Balanced module for Perl (formerly Perl 5, callable from Raku using Inline::Perl5). Not directly useful to me inside a grammar, unfortunately.
Solution
A way to describe nested braces (and nothing else)
Presuming a rule named &R, I'd likely write the following pattern if I was writing a quick small one-off script:
\{ <&R>* \}
If I was writing a larger program that should be maintainable I'd likely be writing a grammar and, using a rule named R the pattern would be:
'{' ~ '}' <R>*
This latter avoids leaning toothpick syndrome and uses the regex ~ operator.
These will both parse arbitrarily deeply nested paired braces, eg:
say '{{{{}}}}' ~~ token { \{ <&?ROUTINE>* \} } # 「{{{{}}}}」
(&?ROUTINE refers to the routine in which it appears. A regex is a routine. (Though you can't use <&?ROUTINE> in a regex declared with / ... / syntax.)
regex vs token
kill backtracking
my regex nested-braces {
:ratchet
The only difference between patterns declared with regex and token is that the former turns ratcheting off. So using it and then immediately turning ratcheting on is notably unidiomatic. Instead:
my token nested-braces {
Backtracking
the "regex" machinery (based on backtracking)
The grammar/regex engine does include backtracking as an optional feature because that's occasionally exactly what one wants.
But the engine is not "based on backtracking", and many grammars/parsers make little or no use of backtracking.
Recursion
a regex can call another, and nowhere do I see a prohibition on recursive calls.
This alone is nothing special for contemporary regex engines.
PCRE has supported recursion since 2000, and named regexes since 2003. Perl's default regex engine has supported both since 2007.
Their support for deeper levels of recursion and more named regexes being stored at once has been increasing over time.
Damian Conway's PPR uses these features of regexes to build non-trivial (but still small) parse trees.
Capabilities
a lot more capable
Raku "regexes" can be viewed as a cleaned up take on the unfolding regex evolution. To the degree this helps someone understand them, great.
But really, it's a whole new deal. For example, they're turing complete, in a sensible way, and thus able to parse anything.
than officially admitted
Well that's an odd thing to say! Raku's Grammars are frequently touted as one of Raku's most innovative features.
There are three major caveats:
Performance The primary current caveat is that a well written C parser will blow the socks off a well written Raku Grammar based parser.
Pay off It's often not worth the effort it takes to write a fully correct parser for a non-trivial format if there's an existing parser.
Left recursion Raku does not automatically rewrite left recursion (infinite loops).
Using existing parsers
I know there are BibTeX parsers around, but I need to grab the complete entry for further processing, and peek at a few keys meanwhile.
Using a foreign module in Raku can be a bit of a revelation. It is not necessarily like anything you'll have experienced before. Raku's foreign language adaptors can do smart marshaling for you so it can be like you're using native Raku features.
Two of the available foreign language adaptors are already sufficiently polished to be amazing -- the ones for Perl and for C.
I'm pretty sure there's a BibTeX package for Perl that wraps a C BibTeX parser. If you used that you'd hopefully get parsing results all nicely wrapped up into Raku objects as if it was all Raku in the first place, but retaining much of the high performance of the C code.
A Raku BibTeX Grammar?
Perhaps your needs do call for creating and using a small Raku Grammar.
(Maybe you're doing this partly as an exercise to familiarize yourself with Raku, or the regex/grammar aspect of Raku. For that it sounds pretty ideal.)
As soon as you begin to use multiple regexes together -- even just two -- you are closing in on grammar territory. After all, they're just an easy-to-use construct for using multiple regexes together.
So if you decide you want to stick with writing parsing code in Raku, expect to write it something like this:
grammar BiBTeX {
token TOP { ... }
token ...
token ...
}
BiBTeX.parse: my-bib-file
For more details, see the official doc's Grammar tutorial or read Moritz's book.
OK, just (re) checked. The documentation of '{' ~ '}' leaves a whole lot to desire, it is not at all clear it is meant to handle balanced, correctly nested delimiters.
So my final solution is really just along the lines:
my regex nested-braces {
:ratchet
'{' ~ '}' .*
}
Thanks everyone! Lerned quite a bit today.

Why are there two ways of indicating error in Elixir?

Some Elixir functions have 2 variants for indicating error
Return a tuple e.g. File.open which returns something like {:ok, io_device} or {:error, posix}
Raise exception e.g. File.open!
My questions are:
What's the intention of having two ways?
Is one preferred over the other (like best practices)?
There are two ways of handling errors, because there are two types of errors:
the expected errors - like user providing bad data, etc. In that case you use the tuple-style return values to handle the error. This also forces the caller to consider the error case and handle it properly.
the truly unexpected exceptions - like a configuration file suddenly disappearing, that can't be recovered from and that there's not much to do beside crashing. In that case you raise an exception.
Because of those two ways, you extremely rarely find yourself in need of rescuing exceptions - where in other languages you would rescue an exception, in Elixir you avoid raising it in the first place, and rather return an ok/error tuple instead.
I'd say the tuple-style is superior, as it gives control to the caller - the caller can decide what to do with the error by either pattern matching on the return value in a case expression and handling both possibilities, or, ignoring the erroneous one, pattern matching directly on the ok tuple. The second one will convert the return value to a MatchError exception, should the unexpected error occur. You can see how the first style can be easily converted to the second one. That said many libraries provide "bang" functions that raise the error for ease of use and ability to provide better error messages than a plain MatchError does allow.
While the {:ok, value} is often paired with {:error, reason}, it's merely a convention. There are many APIs that return only :error without a reason, where the reason is obvious, there are also some that return something different in the successful case. The rule here is to provide an easy pattern match that is not order dependent. Let's see some examples:
{value, rest} | :error
That's a good choice, since the cases are easily distinguishable - this style is used, for example by Integer.parse/2. If the success condition has two return values and there's only one reason for failure, this style is recommended.
string | :error
This doesn't seem like a good idea, you'd either need to have a guard in the pattern match or be careful to match the :error atom first. Instead, one would wrap the success value in a {:ok, string} tuple for ease of use.

Split SQL statements

I am writing a backend application which needs to be able to send multiple SQL commands to a MySQL server.
MySQL >= 5.x support multiple statements, but unfortunately we are interfacing with MySQL 4.x.
I am trying to find a way (hint: regex) to split SQL statements by their semicolon, but it should ignore semicolons in single and double quotes strings.
http://www.dev-explorer.com/articles/multiple-mysql-queries has a very nice regex to do that, but doesn't support double quotes.
I'd be happy to hear your suggestions.
Can't be done with regex, it's insufficiently powerful to parse SQL. There may be an SQL parser available for your language — which is it? — but parsing SQL is quite hard, especially given the range of different syntaxes available. Even in MySQL alone there are many SQL_MODE flags on a server and connection level that can affect how basic strings and comments are parsed, making statements behave quite differently.
The example at dev-explorer goes to amusing lengths to try to cope with escaped apostrophes and trailing strings, but will still fail for many valid combinations of them, not to mention the double quotes, backticks, the various comment syntaxes, or ANSI SQL_MODE.
As bobince said, regular expressions are probably not going to be powerful enough to do this. They're certainly not going to be powerful enough to do it in any halfway elegant manner. The second link cdonner provided also does not address this; most answers there were trying to talk the questioner out of doing this without semicolons; if he had taken the general advice, then he'd have ended up where you are.
I think the quickest path to solving this is going to be with a string scanner function, that examines every character of the string in sequence, and reacts based on a bit of stored state. Rough pseudocode:
Read in a character
If the character is not special, CONTINUE
If the character is escaped (checking this probably requires examining the previous character), CONTINUE
If the character would start a new string or end an existing one, toggle a flag IN_STRING (you might need multiple flags for different string types... I've honestly tried and succeeded at remaining ignorant of the minutiae of SQL quoting/escaping) and CONTINUE
If the character is a semicolon AND we are not currently in a string, we have found a query! OUTPUT it and CONTINUE scanning until the end of the string.
Language parsing is not any of my areas of experience, so you'll want to consider that approach carefully; nonetheless, it's going to be fast (with C-style strings, none of those steps are at all expensive, save possibly for the OUTPUT, depending on what "outputting" means in your context) and I think it should get the job done.
maybe with the following Java Regexp? check the test...
#Test
public void testRegexp() {
String s = //
"SELECT 'hello;world' \n" + //
"FROM DUAL; \n" + //
"\n" + //
"SELECT 'hello;world' \n" + //
"FROM DUAL; \n" + //
"\n";
String regexp = "([^;]*?('.*?')?)*?;\\s*";
assertEquals("<statement><statement>", s.replaceAll(regexp, "<statement>"));
}
I would suggest seeing if you can redefine the problem space so the need to send multiple queries separated only by their terminator is not required.
Try this. Just replaced the 1st ' with \" and it seems to work for both ' and "
;+(?=([^\"|^\\']['|\\'][^'|^\\']['|\\'])[^'|^\\'][^'|^\\']$)

How to make the Lucene QueryParser more forgiving?

I'm using Lucene.net, but I am tagging this question for both .NET and Java versions because the API is the same and I'm hoping there are solutions on both platforms.
I'm sure other people have addressed this issue, but I haven't been able to find any good discussions or examples.
By default, Lucene is very picky about query syntax. For example, I just got the following error:
[ParseException: Cannot parse 'hi there!': Encountered "<EOF>" at line 1, column 9.
Was expecting one of:
"(" ...
"*" ...
<QUOTED> ...
<TERM> ...
<PREFIXTERM> ...
<WILDTERM> ...
"[" ...
"{" ...
<NUMBER> ...
]
Lucene.Net.QueryParsers.QueryParser.Parse(String query) +239
What is the best way to prevent ParseExceptions when processing queries from users? It seems to me that the most usable search interface is one that always executes a query, even if it might be the wrong query.
It seems that there are a few possible, and complementary, strategies:
"Clean" the query prior to sending it to the QueryProcessor
Handle exceptions gracefully
Show an intelligent error message to the user
Perhaps execute a simpler query, leaving off the erroneous bit
I don't really have any great ideas about how to do any of those strategies. Has anyone else addressed this issue? Are there any "simple" or "graceful" parsers that I don't know about?
Yo can make Lucene ignore the special characters by sanitizing the query with something like
query = QueryParser.Escape(query)
If you do not want your users to ever use advanced syntax in their queries, you can do this always.
If you want your users to use advanced syntax but you also want to be more forgiving with the mistakes you should only sanitize after a ParseException has occured.
Well, the easiest thing to do would be to give the raw form of the query a shot, and if that fails, fall back to cleaning it up.
Query safe_query_parser(QueryParser qp, String raw_query)
throws ParseException
{
Query q;
try {
q = qp.parse(raw_query);
} catch(ParseException e) {
q = null;
}
if(q==null)
{
String cooked;
// consider changing this "" to " "
cooked = raw_query.replaceAll("[^\w\s]","");
q = qp.parse(cooked);
}
return q;
}
This gives the raw form of the user's query a chance to run, but if parsing fails, we strip everything except letters, numbers, spaces and underscores; then we try again. We still risk throwing ParseException, but we've drastically reduced the odds.
You could also consider tokenizing the user's query yourself, turning each token into a term query, and glomming them together with a BooleanQuery. If you're not really expecting your users to take advantage of the features of the QueryParser, that would be the best bet. You'd be completely(?) robust, and users could search for whatever funny characters will make it through your analyzer
FYI... Here is the code I am using for .NET
private Query GetSafeQuery(QueryParser qp, String query)
{
Query q;
try
{
q = qp.Parse(query);
}
catch(Lucene.Net.QueryParsers.ParseException e)
{
q = null;
}
if(q==null)
{
string cooked;
cooked = Regex.Replace(query, #"[^\w\.#-]", " ");
q = qp.Parse(cooked);
}
return q;
}
I'm in the same situation as you.
Here's what I do. I do catch the exception, but only so that I can make the error look prettier. I don't change the text.
I also provide a link to an explanation of the Lucene syntax which I have simplified a little bit:
http://ifdefined.com/btnet/lucene_syntax.html
I do not know much about Lucene.net. For general Lucene, I highly recommend the book Lucene in Action. For the question at hand, it depends on your users. There are strong reasons, such as ease of use, security and performance, to limit your users' queries. The book shows ways to parse the queries using a custom parser instead of QueryParser. I second Jay's idea about the BooleanQuery, although you can build stronger queries using a custom parser.
If you don't need all Lucene features, you might go better by writing your own query parser. It's not as complicated as it might seem in the first place.