How can I verify that a Lucene query embedded in a larger XQuery does not contain a syntax error before launching the complete XQuery I want to run? - lucene

I have an application for which I need to allow the user to perform full text search on documents, and use the Lucene Query Parser syntax if desired. The eXist database is queried from a Django backend that uses eulexistdb to talk to eXist.
The problem is that when the user uses an incorrect syntax for the full text search, this is discovered late in the game. The Django application has to query a SQL database to determine some of the parameters of the search. By the time the complete XQuery is built and eXist is accessed, the SQL query has already run, which means that the cost of the SQL query has already been spent. (I know I could marshal the data queried on the SQL side into eXist so that only eXist is queried. It's just not an option for now.)
I'd like to know ahead of time whether the Lucene query has a syntactical error to that I can avoid starting querying the SQL database for nothing.
I've checked the documentation of eXist, but I've not found anything in the API which would be a simple function that checks whether a full-text query is syntactically valid or not.

Here is a simple function that will return True if a Lucene query is fine, or False if there is a syntax error in the query. db must be an instance of eulexistdb.db.ExistDB and query is the Lucene query:
def check(db, query):
try:
db.query(safe_interpolate("ft:query(<doc/>, {lucene_query})",
lucene_query=query))
except ExistDBException as ex:
if ex.message().startswith(
"exerr:ERROR Syntax error in Lucene query string"):
return False
raise ex # Don't swallow other problems that may occur.
return True
This should be adaptable to any language for which there is a library that provides access to eXist. The idea is to run the query of interest against a bogus document (<doc/>). Using the bogus document avoids having to actually search the database. (An empty node sequence might seem better, but we're not running ft:query against an empty node sequence because then the XQuery optimizer could skip trying to parse and run the Lucene query since a valid query on an empty sequence will necessarily return an empty sequence, irrespective of the actual Lucene query.) It does not matter whether it returns any results or not. If the query has no errors, then there won't be an exception. If the query has a syntax error, then an exception will be raised. I've not found a more robust way than checking the error message stored with the exception to detect whether it is a Lucene syntax error or something else.
(The safe_interpolate function is a function that should interpolate lucene_query so as to avoid injections. It is up to you to decide what you need in your application.)

Here is an approach I consider complementary to the one I posted earlier. I'm using lucene-query-parser to perform the check client-side (i.e. in the browser):
define(function (require, exports, _module) {
"use strict";
var lqp = require("lucene-query-parser");
function preDrawCallback() {
// We get the content of the search field.
var search = this.api().search();
var good = true;
try {
lqp.parse(search); // Here we check whether it is syntactically valid.
}
catch (ex) {
if (!(ex instanceof lqp.SyntaxError)) {
throw ex; // Don't swallow exceptions.
}
good = false;
}
// Some work is performed here depending on whether
// the query is good or bad.
return good; // And finally we tell DataTables whether to inhibit the draw.
}
// ....
});
preDrawCallback is used with a DataTables instance. Returning false inhibits drawing the table, which also inhibits performing a query to the server. So if the query is syntactically incorrect, it won't ever make it to the backend. (The define and require calls are there because both my code and lucene-query-parser are AMD modules.)
Potential issues:
If the library that performs the check is buggy or otherwise does not support the entire syntax that Lucene supports, it will block queries that should go through. I've found a few buggy (or at best severely obsolete) libraries before I settled on lucene-query-parser.
If the client-side library happens to support a construct introduced in a later version of Lucene but which is not supported in the version used with eXist. Keeping the backend check I show in my other answer allows to make sure that anything that would slip through is caught there.

Related

What's the difference between string template and log framework's placeholder in Kotlin?

And now, I am trying to rewrite my java application in Kotlin. And then, I met the log statement, like
log.info("do the print thing for {}", arg);
So I have two ways to do the log things in Kotlin like log.info("do the print thing for {}", arg) and log.info("do the print thing for $arg"). The 1st is delegate format to framework like Slf4j or Log4j; the 2nd is using Kotlin string template.
So what's the difference and which one's performance is better?
In general, these two ways produce the same log, unless the logging library is also configured to localise the message and parameters when formatting the message, which Kotlin's string interpolation does not do at all.
The crucial difference lies in the performance, when you turn off logging (at that particular level). As SLF4J's FAQ says:
There exists a very convenient alternative based on message formats.
Assuming entry is an object, you can write:
Object entry = new SomeObject();
logger.debug("The entry is {}.", entry);
After evaluating whether to log or not, and only if the decision is
affirmative, will the logger implementation format the message and
replace the '{}' pair with the string value of entry. In other words,
this form does not incur the cost of parameter construction in case
the log statement is disabled.
The following two lines will yield the exact same output. However, the
second form will outperform the first form by a factor of at least 30,
in case of a disabled logging statement.
logger.debug("The new entry is "+entry+".");
logger.debug("The new entry is {}.", entry);
Basically, if the logging is disabled, the message won't be constructed if you use parameterised logging. If you use string interpolation however, the message will always be constructed.
Note that Kotlin's string interpolation compiles to something similar to what a series of string concatenation (+) in Java compiles to (though this might change in the future).
"foo $bar baz"
is translated into:
StringBuilder().append("foo ").append(bar).append(" baz").toString()
See also: Unable to understand why to use parameterized logging

How to prevent empty list errors in in clause in sql?

One common problem we have in our codebase is that people forget to check if a list is empty before using it in an in clause.
For example (in Scala with Anorm):
def exists(element: String, list: List[String]): Boolean =
SQL("select {element} in {list} as result")
.on('element -> element, 'list -> list)
.as(SqlParser.bool("result").single)
This code works perfectly well as long as list has at least one element.
If it has 0 elements, you get a syntax error, which is weird if you're used to other programming languages that would allow this empty list case.
So, my question is: what's the best way to prevent this error from happening?
Initially, we did this:
def exists(element: String, list: List[String]): Boolean =
if (list.nonEmpty) {
SQL("select {element} in {list} as result")
.on('element -> element, 'list -> list)
.as(SqlParser.bool("result").single)
} else {
false
}
This works perfectly well, and has the added advantage that it doesn't hit the database at all.
Unfortunately, we don't remember to do this every time, and it seems that 1-2 times a month we're fixing an issue related to this.
An alternate solution we came up with was to use a NonEmptyList class instead of a standard List. This class must have at least one element. This works excellent, but again, people have not been diligent with always using this class.
So I'm wondering if there's an approach I'm missing that prevent this type of error better?
It looks like you've already found a way to resolve this problem - you have an exists() function which handles an empty list cleanly. The problem is that people are writing their own exists() functions which don't do that.
You need to make sure that your function is accessible as a utility function, so that you can reuse it whenever you need to, rather than having to rewrite the function.
Your problem is an encapsulation problem: the Anorm API is like an open flame and people can burn themselves. If you rely just on people to take precautions, someone will get burnt.
The solution is to restrict the access to the Anorm API to a limited module/package/area of your code:
Anorm API will be private and accessible only from very few places, where it is going to be easy to perform the necessary controls. This part of the code will expose an API
Every other part of the code will need to go through that API, effectively using Anorm in the "safe" way

Which one is better for me to use: "defer-panic-recover" or checking "if err != nil { //dosomething}" in golang?

I've made a large program that opens and closes files and databases, perform writes and reads on them etc among other things. Since there no such thing as "exception handling in go", and since I didn't really know about "defer" statement and "recover()" function, I applied error checking after every file-open, read-write, database entry etc. E.g.
_,insert_err := stmt.Run(query)
if insert_err != nil{
mylogs.Error(insert_err.Error())
return db_updation_status
}
For this, I define db_updation_status at the beginning as "false" and do not make it "true" until everything in the program goes right.
I've done this in every function, after every operation which I believe could go wrong.
Do you think there's a better way to do this using defer-panic-recover? I read about these here http://golang.org/doc/articles/defer_panic_recover.html, but can't clearly get how to use them. Do these constructs offer something similar to exception-handling? Am I better off without these constructs?
I would really appreciate if someone could explain this to me in a simple language, and/or provide a use case for these constructs and compare them to the type of error handling I've used above.
It's more handy to return error values - they can carry more information (advantage to the client/user) than a two valued bool.
What concerns panic/recover: There are scenarios where their use is completely sane. For example, in a hand written recursive descent parser, it's quite a PITA to "bubble" up an error condition through all the invocation levels. In this example, it's a welcome simplification if there's a deferred recover at the top most (API) level and one can report any kind of error at any invocation level using, for example
panic(fmt.Errorf("Cannot %v in %v", foo, bar))
If an operation can fail and returns an error, than checking this error immediately and handling it properly is idiomatic in go, simple and nice to check if anything gets handled properly.
Don't use defer/recover for such things: Needed cleanup actions are hard to code, especially if stuff gets nested.
The usual way to report an error to a caller is to return an error as an extra return value. The canonical Read method is a well-known instance; it returns a byte count and an error.
But what if the error is unrecoverable? Sometimes the program simply cannot continue.
For this purpose, there is a built-in function panic that in effect creates a run-time error that will stop the program (but see the next section). The function takes a single argument of arbitrary type—often a string—to be printed as the program dies. It's also a way to indicate that something impossible has happened, such as exiting an infinite loop.
http://golang.org/doc/effective_go.html#errors

Fluent NHibernate RegisterFunction SQLFunctionTemplate usage

I've seen this opportunity reported at least half a dozen times with about as many responses.
My problem is, I've got a MySQL database function defined, we'll call it "my_func(int val) returns int", which works fine if I test directly on the database.
I've also gotten it to work with a direct SQL passthrough my repository implementation, which is okay, but I'd rather route it through Hql, for some god-awful reason...
So... I've got a MySQL5Dialect setup to register the function and I'm having some difficulty parsing through the expected conventions.
My understanding is that I need to prefix the function name with "dbo." at some point during the function registration?
Something like this,
//...
RegisterFunction("my_func", new SQLFunctionTemplate(NHibernateUtil.Int32, "my_func(?1)"));
//...
And then through my repository,
var value = repository.FindByHQL<int>("select my_func(2)").Single();
Where FindByHQL returns an IList.
Any thoughts why this wouldn't work.
I'm running the latest WAMP (2.1e I think).
Enough info? Let me know if I can provide any further details.
Thanks,
Michael
select my_func(2)
is not valid HQL, regardless of whether the function is registered or not.
You can use SQL instead if that's your use case.
Post full exception with stack trace if it's not and this was just a simplified example.

Is returning null bad design? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I've heard some voices saying that checking for a returned null value from methods is bad design. I would like to hear some reasons for this.
pseudocode:
variable x = object.method()
if (x is null) do something
The rationale behind not returning null is that you do not have to check for it and hence your code does not need to follow a different path based on the return value. You might want to check out the Null Object Pattern which provides more information on this.
For example, if I were to define a method in Java that returned a Collection I would typically prefer to return an empty collection (i.e. Collections.emptyList()) rather than null as it means my client code is cleaner; e.g.
Collection<? extends Item> c = getItems(); // Will never return null.
for (Item item : c) { // Will not enter the loop if c is empty.
// Process item.
}
... which is cleaner than:
Collection<? extends Item> c = getItems(); // Could potentially return null.
// Two possible code paths now so harder to test.
if (c != null) {
for (Item item : c) {
// Process item.
}
}
Here's the reason.
In Clean Code by Robert Martin he writes that returning null is bad design when you can instead return, say, empty array. Since expected result is an array, why not? It'll enable you to iterate over result without any extra conditions. If it's an integer, maybe 0 will suffice, if it's a hash, empty hash. etc.
The premise is to not force calling code to immediately handle issues. Calling code may not want to concern itself with them. That's also why in many cases exceptions is better than nil.
Good uses of returning null:
If null is a valid functional result, for example: FindFirstObjectThatNeedsProcessing() can return null if not found and the caller should check accordingly.
Bad uses: Trying to replace or hide exceptional situations such as:
catch(...) and return null
API dependency initialization failed
Out of disk space
Invalid input parameters (programming error, inputs must be sanitized by the caller)
etc
In those cases throwing an exception is more adequate since:
A null return value provides no meaningful error info
The immediate caller most likely cannot handle the error condition
There is no guarantee that the caller is checking for null results
However, Exceptions should not be used to handle normal program operation conditions such as:
Invalid username/password (or any user-provided inputs)
Breaking loops or as non-local gotos
Yes, returning NULL is a terrible design, in object-oriented world. In a nutshell, NULL usage leads to:
ad-hoc error handling (instead of exceptions)
ambiguous semantic
slow instead of fast failing
computer thinking instead of object thinking
mutable and incomplete objects
Check this blog post for a detailed explanation: http://www.yegor256.com/2014/05/13/why-null-is-bad.html. More in my book Elegant Objects, Section 4.1.
Who says this is bad design?
Checking for nulls is a common practice, even encouraged, otherwise you run the risk of NullReferenceExceptions everywhere. Its better to handle the error gracefully than throw exceptions when you don't need to.
Based on what you've said so far, I think there's not enough information.
Returning null from a CreateWidget()method seems bad.
Returning null from a FindFooInBar() method seems fine.
Its inventor says it is a billion dollar mistake!
It depends on the language you're using. If you're in a language like C# where the idiomatic way of indicating the lack of a value is to return null, then returning null is a good design if you don't have a value. Alternatively, in languages such as Haskell which idiomatically use the Maybe monad for this case, then returning null would be a bad design (if it were even possible).
If you read all the answers it becomes clear the answer to this question depends on the kind of method.
Firstly, when something exceptional happens (IOproblem etc), logically exceptions are thrown. When exactly something is exceptional is probably something for a different topic..
Whenever a method is expected to possibly have no results there are two categories:
If it is possible to return a neutral value, do so.
Empty enumrables, strings etc are good examples
If such a neutral value does not exist, null should be returned.
As mentioned, the method is assumed to possibly have no result, so it is not exceptional, hence should not throw an exception. A neutral value is not possible (for example: 0 is not especially a neutral result, depending on the program)
Untill we have an official way to denote that a function can or cannot return null, I try to have a naming convention to denote so.
Just like you have the TrySomething() convention for methods that are expected to fail, I often name my methods SafeSomething() when the method returns a neutral result instead of null.
I'm not fully ok with the name yet, but couldn't come up with anything better. So I'm running with that for now.
I have a convention in this area that served me well
For single item queries:
Create... returns a new instance, or throws
Get... returns an expected existing instance, or throws
GetOrCreate... returns an existing instance, or new instance if none exists, or throws
Find... returns an existing instance, if it exists, or null
For collection queries:
Get... always returns a collection, which is empty if no matching[1] items are found
[1] given some criteria, explicit or implicit, given in the function name or as parameters.
Exceptions are for exceptional circumstances.
If your function is intended to find an attribute associated with a given object, and that object does has no such attribute, it may be appropriate to return null. If the object does not exist, throwing an exception may be more appropriate. If the function is meant to return a list of attributes, and there are none to return, returning an empty list makes sense - you're returning all zero attributes.
It's not necessarily a bad design - as with so many design decisions, it depends.
If the result of the method is something that would not have a good result in normal use, returning null is fine:
object x = GetObjectFromCache(); // return null if it's not in the cache
If there really should always be a non-null result, then it might be better to throw an exception:
try {
Controller c = GetController(); // the controller object is central to
// the application. If we don't get one,
// we're fubar
// it's likely that it's OK to not have the try/catch since you won't
// be able to really handle the problem here
}
catch /* ... */ {
}
It's fine to return null if doing so is meaningful in some way:
public String getEmployeeName(int id){ ..}
In a case like this it's meaningful to return null if the id doesn't correspond to an existing entity, as it allows you to distinguish the case where no match was found from a legitimate error.
People may think this is bad because it can be abused as a "special" return value that indicates an error condition, which is not so good, a bit like returning error codes from a function but confusing because the user has to check the return for null, instead of catching the appropriate exceptions, e.g.
public Integer getId(...){
try{ ... ; return id; }
catch(Exception e){ return null;}
}
For certain scenarios, you want to notice a failure as soon as it happens.
Checking against NULL and not asserting (for programmer errors) or throwing (for user or caller errors) in the failure case can mean that later crashes are harder to track down, because the original odd case wasn't found.
Moreover, ignoring errors can lead to security exploits. Perhaps the null-ness came from the fact that a buffer was overwritten or the like. Now, you are not crashing, which means the exploiter has a chance to execute in your code.
What alternatives do you see to returning null?
I see two cases:
findAnItem( id ). What should this do if the item is not found
In this case we could: Return Null or throw a (checked) exception (or maybe create an item and return it)
listItemsMatching (criteria) what should this return if nothing is found?
In this case we could return Null, return an empty list or throw an Exception.
I believe that return null may be less good than the alternatives becasue it requires the client to remember to check for null, programmers forget and code
x = find();
x.getField(); // bang null pointer exception
In Java, throwing a checked exception, RecordNotFoundException, allows the compiler to remind the client to deal with case.
I find that searches returning empty lists can be quite convenient - just populate the display with all the contents of the list, oh it's empty, the code "just works".
Make them call another method after the fact to figure out if the previous call was null. ;-) Hey, it was good enough for JDBC
Well, it sure depends of the purpose of the method ... Sometimes, a better choice would be to throw an exception. It all depends from case to case.
Sometimes, returning NULL is the right thing to do, but specifically when you're dealing with sequences of different sorts (arrays, lists, strings, what-have-you) it is probably better to return a zero-length sequence, as it leads to shorter and hopefully more understandable code, while not taking much more writing on API implementer's part.
The base idea behind this thread is to program defensively. That is, code against the unexpected.
There is an array of different replies:
Adamski suggests looking at Null Object Pattern, with that reply being up voted for that suggestion.
Michael Valenty also suggests a naming convention to tell the developer what may be expected.
ZeroConcept suggests a proper use of Exception, if that is the reason for the NULL.
And others.
If we make the "rule" that we always want to do defensive programming then we can see that these suggestions are valid.
But we have 2 development scenarios.
Classes "authored" by a developer: The Author
Classes "consumed" by another(maybe) developer: the Developer
Regardless of whether a class returns NULL for methods with a return value or not,
the Developer will need to test if the object is valid.
If the developer cannot do this, then that Class/method is not deterministic.
That is, if the "method call" to get the object does not do what it "advertises" (eg getEmployee) it has broken the contract.
As an author of a class, I always want to be as kind and defensive ( and deterministic) when creating a method.
So given that either NULL or the NULL OBJECT (eg if(employee as NullEmployee.ISVALID)) needs to be checked
and that may need to happen with a collection of Employees, then the null object approach is the better approach.
But I also like Michael Valenty's suggestion of naming the method that MUST return null eg getEmployeeOrNull.
An Author who throws an exception is removing the choice for the developer to test the object's validity,
which is very bad on a collection of objects, and forces the developer into exception handling
when branching their consuming code.
As a developer consuming the class, I hope the author gives me the ability to avoid or program for the null situation
that their class/methods may be faced with.
So as a developer I would program defensively against NULL from a method.
If the author has given me a contract that always returns a object (NULL OBJECT always does)
and that object has a method/property by which to test the validity of the object,
then I would use that method/property to continue using the object, else the object is not valid
and I cannot use it.
Bottom line is that the Author of the Class/Methods must provide mechanisms
that a Developer can use in their defensive programming. That is, a clearer intention of the method.
The Developer should always use defensive programming to test the validity of the objects returned
from another class/method.
regards
GregJF
Other options to this, are:
returning some value that indicates success or not (or type of an error), but if you just need boolean value that will indicate success / fail, returning null for failure, and an object for success wouldn't be less correct, then returning true/false and getting the object through parameter.
Other approach would to to use exception to indicates failures, but here - there are actually many more voices, that say this is a BAD practice (as using exceptions may be convenient but has many disadvantages).
So I personally don't see anything bad in returning null as indication that something went wrong, and checking it later (to actually know if you have succeeded or not). Also, blindly thinking that your method will not return NULL, and then base your code on it, may lead to other, sometimes hard to find, errors (although in most cases it will just crash your system :), as you will reference to 0x00000000 sooner or later).
Unintended null functions can arise during the development of a complex programs, and like dead code, such occurrences indicate serious flaws in program structures.
A null function or method is often used as the default behavior of a revectorable function or overrideable method in an object framework.
Null_function #wikipedia
If the code is something like:
command = get_something_to_do()
if command: # if not Null
command.execute()
If you have a dummy object whose execute() method does nothing, and you return that instead of Null in the appropriate cases, you don't have to check for the Null case and can instead just do:
get_something_to_do().execute()
So, here the issue is not between checking for NULL vs. an exception, but is instead between the caller having to handle special non-cases differently (in whatever way) or not.
For my use case I needed to return a Map from method and then looking for a specific key. But if I return an empty Map, then it will lead to NullPointerException and then it wont be much different returning null instead of an empty Map.
But from Java8 onward we could use Optional. The above is the very reason Optional concept was introduced.
G'day,
Returning NULL when you are unable to create a new object is standard practise for many APIs.
Why the hell it's bad design I have no idea.
Edit: This is true of languages where you don't have exceptions such as C where it has been the convention for many years.
HTH
'Avahappy,