Proper error propagation in clojure - error-handling

I'm currently working on my first major project in clojure and have run into a question regarding coding style and the most "clojure-esque" way of doing something. Basically I have a function I'm writing which takes in a data structure and a template that the function will try to massage the data structure into. The template structure will look something like this:
{
:key1 (:string (:opt :param))
:key2 (:int (:opt :param))
:key3 (:obj (:tpl :template-structure))
:key4 (:list (:tpl :template-structure))
}
Each key is an atom that will be searched for in the given data structure, and it's value will be attempted to be matched to the type given in the template structure. So it would look for :key1 and check that it's a string, for instance. The return value would be a map that has :key1 pointing to the value from the given data structure (the function could potentially change the value depending on the options given).
In the case of :obj it takes in another template structure, and recursively calls itself on that value and the template structure, and places the result from that in the return. However, if there's an error I want that error returned directly.
Similarly for lists I want it to basically do a map of the function again, except in the case of an error which I want returned directly.
My question is what is the best way to handle these errors? Some simple exception handling would be the easiest way, but I feel that it's not the most functional way. I could try and babysit the errors all the way up the chain with tons of if statements, but that also doesn't seem very sporting. Is there something simple I'm missing or is this just an ugly problem?

You might be interested in schematic, which does pretty similar stuff. You can see how it's used in the tests, and the implementation.
Basically I defined an error function, which returns nil for correctly-formatted data, or a string describing the error. Doing it with exceptions instead would make the plumbing easier, but would make it harder to get the detailed error messages like "[person: [first_name: expected string, got integer]]".

Related

Accord.net Codification can't handle non-strings

I am trying to use the Accord.net library to build test method of several of the machine learning algorithms that library supports.
One of the issues I have run into is that when I am trying to codify my string data, the Codification class does not seem capable of dealing with any datatable columns that are not strings, despite the documentation saying otherwise.
Codification codebook = new Codification(fulldata, AllAttributeNames);
I call that line where fulldata is a datatable, and I have tried including columns of both Int32 type and Double type, and the Codification class has thrown an error saying it is unable to convert them to type String.
"System.InvalidCastException: 'Unable to cast object of type 'System.Double' to type 'System.String'.'"
EDIT: It turns out this error is because the Codification system can only handle alternate data types if it is encoding the entire table. I suppose I can see the logic here, although I would prefer a better error, or that the method was a little smarter.
I now have another issue that has cropped up related to this. After changing my code to this:
Codification codebook = new Codification(fulldata);
I then learning.Learn(inputs, outputs) my algorithm and want to use the newly trained algorithm. So the next step would be to take a bunch of test data, make sure it matches the codebooks encoding, and send it through the algorithm. Unfortunately, when I try and use the
int[][] testinput = codebook.Transform(testData, inputColumnNameArray);
It blows up claiming it could not find a mapping to transform. It does this in reference to an Integer column that the codebook correctly did not map to new values. So now it seems this Transform method is not capable of handling non-string columns, and I have not found an overload of it that can, even though the documentation indicates it should be able to handle this.
Does anyone know how to get around this issue without manually building the entire int[][] testinput array one value at a time?
Turns out I was able to answer my own question eventually.
The Codification class has two methods of using it as near as I can tell. The constructor that takes a list of column names, as well as the Transform methods both lack intelligence in dealing with non-string data types, perhaps these methods are going away in the future.
The constructor that just takes a datatable by itself, as well as the Apply method, are both capable of handling data types other than strings. Once I switched to using these two methods my errors went away.
Codification codebook = new Codification(fulldata);
int[][] testinput = codebook.Apply(testData, inputColumnNameArray);
The confusion for me lay in all the example code seemingly randomly using these two methods, but using the Apply method only when processing the training data, and using the Transform method when encoding test data.
I am not sure why they chose to do this in the documentation example code, but it definitely took me a long time to figure out what was going on enough to stop having this particular issue.

How to prevent empty list errors in in clause in sql?

One common problem we have in our codebase is that people forget to check if a list is empty before using it in an in clause.
For example (in Scala with Anorm):
def exists(element: String, list: List[String]): Boolean =
SQL("select {element} in {list} as result")
.on('element -> element, 'list -> list)
.as(SqlParser.bool("result").single)
This code works perfectly well as long as list has at least one element.
If it has 0 elements, you get a syntax error, which is weird if you're used to other programming languages that would allow this empty list case.
So, my question is: what's the best way to prevent this error from happening?
Initially, we did this:
def exists(element: String, list: List[String]): Boolean =
if (list.nonEmpty) {
SQL("select {element} in {list} as result")
.on('element -> element, 'list -> list)
.as(SqlParser.bool("result").single)
} else {
false
}
This works perfectly well, and has the added advantage that it doesn't hit the database at all.
Unfortunately, we don't remember to do this every time, and it seems that 1-2 times a month we're fixing an issue related to this.
An alternate solution we came up with was to use a NonEmptyList class instead of a standard List. This class must have at least one element. This works excellent, but again, people have not been diligent with always using this class.
So I'm wondering if there's an approach I'm missing that prevent this type of error better?
It looks like you've already found a way to resolve this problem - you have an exists() function which handles an empty list cleanly. The problem is that people are writing their own exists() functions which don't do that.
You need to make sure that your function is accessible as a utility function, so that you can reuse it whenever you need to, rather than having to rewrite the function.
Your problem is an encapsulation problem: the Anorm API is like an open flame and people can burn themselves. If you rely just on people to take precautions, someone will get burnt.
The solution is to restrict the access to the Anorm API to a limited module/package/area of your code:
Anorm API will be private and accessible only from very few places, where it is going to be easy to perform the necessary controls. This part of the code will expose an API
Every other part of the code will need to go through that API, effectively using Anorm in the "safe" way

In data flow coverage, does returning a variable use it?

I have a small question in my mind. I researched it on the Internet but no-one is providing the exact answer. My question is:
In data flow coverage criteria, say there is a method which finally returns variable x. When drawing the graph for that method, is that return statement considered to be a use of x?
Yes, a return statement uses the value that it returns. I couldn't find an authoritative reference that says so in plain English either, but here are two arguments:
A return statement passes control from one part of a program to another, just like a method call does. The value being returned is analogous to a function parameter. return therefore is a use just like being a function parameter is a use.
The other kind of use in data flow analysis is when a value leaves the program and has some effect on the outside world, for example by being printed. If we're analyzing a method, rather than an entire program, return causes the value to leave the scope which we're analyzing. So it's a use for the same reason that printing is a use.

Runtime method to get names of argument variables?

Inside an Objective-C method, it is possible to get the selector of the method with the keyword _cmd. Does such a thing exist for the names of arguments?
For example, if I have a method declared as such:
- (void)methodWithAnArgument:(id)foo {
...
}
Is there some sort of construct that would allow me to get access to some sort of string-like representation of the variable name? That is, not the value of foo, but something that actually reflects the variable name "foo" in a local variable inside the method.
This information doesn't appear to be stored in NSInvocation or any of its related classes (NSMethodSignature, etc), so I'm not optimistic this can be done using Apple's frameworks or the runtime. I suspect it might be possible with some sort of compile-time macro, but I'm unfamiliar with C macros so I wouldn't know where to begin.
Edit to contain more information about what I'm actually trying to do.
I'm building a tool to help make working with third-party URL schemes easier. There are two sides to how I want my API to look:
As a consumer of a URL scheme, I can call a method like [twitterHandler showUserWithScreenName:#"someTwitterHandle"];
As a creator of an app with a URL scheme, I can define my URLs in a plist dictionary, whose key-value pairs look something like #"showUserWithScreenName": #"twitter://user?screenName={screenName}".
What I'm working on now is finding the best way to glue these together. The current fully-functioning implementation of showUserWithScreenName: looks something like this:
- (void)showUserWithScreenName:(NSString *)screenName {
[self performCommand:NSStringFromSelector(_cmd) withArguments:#{#"screenName": screenName}];
}
Where performCommand:withArguments: is a method that (besides some other logic) looks up the command key in the plist (in this case "showUserWithScreenName:") and evaluates the value as a template using the passed dictionary as the values to bind.
The problem I'm trying to solve: there are dozens of methods like this that look exactly the same, but just swap out the dictionary definition to contain the correct template params. In every case, the desired dictionary key is the name of the parameter. I'm trying to find a way to minimize my boilerplate.
In practice, I assume I'm going to accept that there will be some boilerplate needed, but I can probably make it ever-so-slightly cleaner thanks to NSDictionaryOfVariableBindings (thanks #CodaFi — I wasn't familiar with that macro!). For the sake of argument, I'm curious if it would be possible to completely metaprogram this using something like forwardInvocation:, which as far as I can tell would require some way to access parameter names.
You can use componentsSeparatedByString: with a : after you get the string from NSStringFromSelector(_cmd) and use your #selector's argument names to put the arguments in the correct order.
You can also take a look at this post, which is describing the method naming conventions in Objective C

Write a compiler for a language that looks ahead and multiple files?

In my language I can use a class variable in my method when the definition appears below the method. It can also call methods below my method and etc. There are no 'headers'. Take this C# example.
class A
{
public void callMethods() { print(); B b; b.notYetSeen();
public void print() { Console.Write("v = {0}", v); }
int v=9;
}
class B
{
public void notYetSeen() { Console.Write("notYetSeen()\n"); }
}
How should I compile that? what i was thinking is:
pass1: convert everything to an AST
pass2: go through all classes and build a list of define classes/variable/etc
pass3: go through code and check if there's any errors such as undefined variable, wrong use etc and create my output
But it seems like for this to work I have to do pass 1 and 2 for ALL files before doing pass3. Also it feels like a lot of work to do until I find a syntax error (other than the obvious that can be done at parse time such as forgetting to close a brace or writing 0xLETTERS instead of a hex value). My gut says there is some other way.
Note: I am using bison/flex to generate my compiler.
My understanding of languages that handle forward references is that they typically just use the first pass to build a list of valid names. Something along the lines of just putting an entry in a table (without filling out the definition) so you have something to point to later when you do your real pass to generate the definitions.
If you try to actually build full definitions as you go, you would end up having to rescan repatedly, each time saving any references to undefined things until the next pass. Even that would fail if there are circular references.
I would go through on pass one and collect all of your class/method/field names and types, ignoring the method bodies. Then in pass two check the method bodies only.
I don't know that there can be any other way than traversing all the files in the source.
I think that you can get it down to two passes - on the first pass, build the AST and whenever you find a variable name, add it to a list that contains that blocks' symbols (it would probably be useful to add that list to the corresponding scope in the tree). Step two is to linearly traverse the tree and make sure that each symbol used references a symbol in that scope or a scope above it.
My description is oversimplified but the basic answer is -- lookahead requires at least two passes.
The usual approach is to save B as "unknown". It's probably some kind of type (because of the place where you encountered it). So you can just reserve the memory (a pointer) for it even though you have no idea what it really is.
For the method call, you can't do much. In a dynamic language, you'd just save the name of the method somewhere and check whether it exists at runtime. In a static language, you can save it in under "unknown methods" somewhere in your compiler along with the unknown type B. Since method calls eventually translate to a memory address, you can again reserve the memory.
Then, when you encounter B and the method, you can clear up your unknowns. Since you know a bit about them, you can say whether they behave like they should or if the first usage is now a syntax error.
So you don't have to read all files twice but it surely makes things more simple.
Alternatively, you can generate these header files as you encounter the sources and save them somewhere where you can find them again. This way, you can speed up the compilation (since you won't have to consider unchanged files in the next compilation run).
Lastly, if you write a new language, you shouldn't use bison and flex anymore. There are much better tools by now. ANTLR, for example, can produce a parser that can recover after an error, so you can still parse the whole file. Or check this Wikipedia article for more options.