ReasonML vs Elm [closed] - elm

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I saw this ReasonML vs TypeScript question here at StackOverflow, now I'm wondering how ReasonML and Elm compare to each other.
What are their similarities and differences? Which one should I use when?
What's the advantage of one over the other?

I'm not intimately familiar with Elm, but I've looked into it a bit and I'm pretty familiar with Reason, so I'll give it a shot. I'm sure there will be inaccuracies here though, so please don't take anything I say as fact, but use it instead as pointers for what to look into in more detail yourself if it matters to you.
Both Elm and Reason are ML-like languages with very similar programming models, so I'll focus on the differences.
Syntax:
Elm uses a Haskell-like syntax that is designed (and/or evolved) for the programming model both Elm and Reason uses, so should work very well for reading and writing idiomatic code once you`re familiar with it, but will seem very different and unfamiliar to most programmers.
Reason tries to be more approachable by emulating JavaScript's syntax as much as possible, which will be familiar to most programmers. However it also aims to support the entire feature set of the underlying OCaml language, which makes some functional patterns quite awkward.
One example of this is the function application syntax, which in Elm emphasizes the curried nature of functions (f a b) and works very well for composing functions and building readable DSLs. Reason's parenthesized syntax (f(a, b)) hides this complexity, which makes it easier to get into (until you accidentally trip on it, since it's of course still different underneath), but makes heavy use of function composition a mess of parentheses.
Mutability:
Elm is a purely functional language, which is great in theory but challenging in practice since the surrounding world cares little about Elm's quest for purity. Elm's preferred solution to this, I think, is to isolate the impurity by writing the offending code in JavaScript instead, and to then access it in Elm through either web components or ports. This means you might have to maintain significant amounts of code in a separate and very unsafe language, quite a bit of boilerplate to connect them, as well as having to figure out how to fit the round things though the square holes of ports and such in the first place.
Reason on the other hand is... pragmatic, as I like to call it. You sacrifice some safety, ideals and long-term benefits for increased productivity and short-term benefits. Isolating impurity is still good practice in Reason, but you're inevitably going to take short-cuts just to get things done, and that is going to bite you later.
But even if you do manage to be disciplined enough to isolate all impurity, you still have to pay a price to have mutation in the language. Part of that price is what's called the value restriction, which you're going to run into sooner or later, and it's going to both confuse and infuriate you, since it will reject code that intuitively should work, just because the compiler is unable to prove that there can`t at some point be a mutable reference involved.
JavaScript interoperability:
As mentioned above, Elm provides the ability to interoperate with JavaScript through ports and web components, which are deliberately quite limited. You used to be able to use native modules, which offered much more flexibility (and ability to shoot yourself in the foot), but that possibility is going away (for the plebs at least), a move that has not been uncontroversial (but also shouldn't be all that surprising given the philosophy). Read more about this change here
Reason, or rather BuckleScript, provides a rich set of primitives to be able bind directly to JavaScript, and very often produce an idiomatic Reason interface without needing to write any glue code. And while not very intuitive, it's pretty easy to do once you grok it. It's also easy to get it wrong and have it blow up in your face at some random point later, however. Whatever glue code you do have to write to provide a nice idiomatic API can be written in Reason, with all its safety guarantees, instead of having to write unsafe JavaScript.
Ecosystem:
As a consequence of Elm's limited JavaScript interoperability, the ecosystem is rather small. There aren't a whole lot of good quality third party JavaScript libraries that provide web components, and doing it yourself takes a lot of effort. So you'll instead see libraries being implemented directly in Elm itself, which takes even more effort, of course, but will often result in higher quality since they're specifically designed for Elm.
Tooling:
Elm is famous for its great error messages. Reason to a large degree does not, though it strives to. This is at least partly because Reason is not itself a compiler but instead built on top of the OCaml compiler, so the information available is limited, and the surface area of possible errors very large. But they're also not as well thought through.
Elm also has a great packaging tool which sets everything up for you and even checks whether the interface of a package you're publishing has changed and that the version bump corresponds to semantic versioning. Resaon/BuckleScript just uses npm and requires you to manage everything Reason/BuckleScript-specific manually like updating bsconfig.json with new dependencies.
Reason, BuckleScript, its build system, and OCaml are all blazing fast though. I've yet to experience any project taking more than 3 seconds to compile from scratch, including all dependencies, and incremental compilation usually takes only milliseconds (though this isn't entirely without cost to user-friendliness). Elm, as I understand it, is not quite as performant.
Elm and Reason both have formatting tools, but Reason-formatted code is of significantly poorer quality (though slowly improving). I think this is largely because of the vastly more complex syntax it has to deal with.
Maturity and decay:
Reason, being built on OCaml, has roots going back more than 20 years. That means it has a solid foundation that's been battle-tested and proven to work over a long period of time. Furthermore, it's a language largely developed by academics, which means a feature might take a while to get implemented, but when it does get in it's rock solid because it's grounded in theory and possibly even formally proven. On the downside, its age and experimental nature also means it's gathered a bit of cruft that`s difficult to get rid of.
Elm on the other hand, being relatively new and less bureaucratically managed, can move faster and is not afraid of breaking with the past. That makes for a slimmer and more coherent, but also has a less powerful type system.
Portability:
Elm compiles to JavaScript, which in itself is quite portable, but is currently restricted to the browser, and even more so to the Elm Architecture. This is a choice, and it wouldn't be too difficult to target node or platforms. But the argument against it is, as I understand it, that it would divert focus, thereby making it less excellent at its niche
Reason, being based on OCaml, actually targets native machine code and bytecode first and foremost, but also has a JavaScript compiler (or two) that enables it to target browsers, node, electron, react native, and even the ability to compile into a unikernel. Windows support is supposedly a bit sketchy though. As an ecosystem, Reason targets React first and foremost, but also has libraries allowing the Elm Architecture to be used quite naturally
Governance:
Elm is designed and developed by a single person who's able to clearly communicate his goals and reasoning and who's being paid to work on it full-time. This makes for a coherent and well-designed end product, but development is slow, and the bus factor might make investment difficult.
Reason's story is a bit more complex, as it's more of an umbrella name for a collection of projects.
OCaml is managed, designed and developed in the open, largely by academics but also by developers sponsored by various foundations and commercial backers.
BuckleScript, a JavaScript compiler that derives from the OCaml compiler, is developed by a single developer whose goals and employment situation is unclear, and who does not bother to explain his reasoning or decisions. Development is technically more open in that PRs are accepted, but the lack of explanation and obtuse codebase makes it effectively closed development. Unfortunately this does not lead to a particularly coherent design either, and the bus factor might make investment difficult here as well.
Reason itself, and ReasonReact, is managed by Facebook. PRs are welcomed, and a significant amount of Reason development is driven by outsiders, but most decisions seem to be made in a back room somewhere. PRs to ReasonReact, beyond trivial typo fixes and such, are often rejected, probably for good reason but usually with little explanation. A better design will then typically emerge from the back room sometime later.

Related

Pros and Cons of using object oriented programming for progress openedge

I understand the pros and cons of using object oriented programming as a concept. What I'm looking for are the pros and cons of using oo in progress/openedge specifically. Are there challenges that I need to take into account? Are there parts of the language that don't mesh well with oo? Stuff like that.
Edit: using 10.2b
I'll give you my opinion, but be forewarned that I'm probably the biggest Progress hater out there. ;) That said, I have written several medium-sized projects in OOABL so I have some experience in the area. These are some things I wrote, just so you know I'm not talking out of my hat:
STOMP protocol framework for clients and servers
a simple ORM mimicking ActiveRecord
an ABL compiler interface for the organization I was at (backend and frontend)
a library for building up Excel and Word documents (serializing them using the MS Office 2003 XML schemas; none of that silly COM stuff)
an email client that could send emails using multiple strategies
OOABL Pros:
If you absolutely must write Progress code, it is a great option for creating reusable code.
Great way to clean up an existing procedural codebase
OOABL Cons:
Class hierarchies are limited; you can’t create inherited (sub-)
interfaces in 10.2B (I think this was going to be added in 11). Older
versions of OpenEdge have other limitations like lack of abstract
classes. This limits your ability to create clean OO design and will
hurt you when you start building non-trivial things.
Error handling sucks - CATCH/THROW doesn’t let you throw your custom
errors and force callers to catch them. Backwards compatibility
prevents this from evolving further so I doubt it will ever improve.
Object memory footprint is large, and there are no AVM debugging
tools to track down why (gotta love these closed systems!)
Garbage collection wasn’t existent ‘til 10.2A, and still
has some bugs even in 11 (see official OE forum for some examples)
Network programming (with sockets) is a PITA - you have to run a
separate persistent procedure to manage the socket. I think evented
programming in OOABL was a PITA in general; I remember getting a lot
of errors about “windowed environments” or something to that effect
when trying to use them. PUBLISH/SUBSCRIBE didn’t work either,
if memory serves.
Depending on your environment, code reviews may be difficult as most
Progress developers don’t do OOABL so may not understand your code
If above point is true, you may face active resistance from
entrenched developers who feel threatened by having to learn new
things
OO is all about building small, reusable pieces that can be combined to make a greater whole. A big problem with OOABL is that the “ABL” part drags you down with its coarse data structures and lack of enumerators, which prevent you from really being able to build truly beautiful things with it. Unfortunately, since it is a closed language you can’t just sidestep the hand you’re dealt and create your own new data or control structures for it externally.
Now, it is theoretically possible to try and build some of these things using MEMPTRs, fixed arrays (EXTENT), and maybe WORK-TABLEs. However, I had attempted this in 10.1C and the design fell apart due to the lack of interface inheritance and abstract classes, and as I expected, performance was quite bad. The latter part may just be due to my poor ability, but I suspect it's an implementation limitation that would be nigh impossible to surmount.
The bottom line is use OOABL if you absolutely must be coding in OpenEdge - it’s better than procedural ABL and the rough edges get slightly smoother after each iterative release of OpenEdge. However, it will never be a beautiful language (OO or otherwise).
If you want to learn proper object-oriented programming and aren’t constricted to ABL, I would highly recommend looking at a language that treats objects as a first-class citizen such as Ruby or Smalltalk.
In the last four years I have worked 80% of the time with OOABL (started with 10.1c).
I definitely recommend using OOABL but I think it is very important to consider that using OOABL the same way as in other OO languages is fraught with problems.
With "the same way" I mean design patterns and implementation practices that are common in the oo world. Also some types of applications, especially in the area of technical frameworks are hard to do with OpenEdge (e.g. ORM).
The causes are performance problems with OOABL and missing OO features in the language.
If you are programming in C# or Java for exampe, memory footprint and instantiation time of objects are not a big issue in many cases. Using ABL this becomes a big issue much more often.
This leads to other design decisions and prevents the implementation of some patterns and frameworks.
Some missing or bad OO features:
No class library, no data structures needed for oo
No package visibility as in java (internal in c#)
This becomes relevant especially in larger applications
No Generics
Really terrible Exception Handling
Very limited reflection capabilities (improved in oe11)
So if you are familiar with oo programming in other languages and start working with OOABL, you could reach a point at which you were missing a lot of things you expect to be there, and get frustrated when trying to implement such things in ABL.
If your application has to run on Windows only, it is also possible to implement new oo code in C# and call it from your existing progress code via clr bridge, which works very smoothly.
K+
Only one thing - "Error handling sucks" - it sucks, but not because you can not make your own error-classes a catch them in caller block - that works, I'm using that. What sucks is that mix from old NO-ERROR / ERROR-HANDLE option and Progress.Lang.Error / CATCH block and ROUTINE-LEVEL ON ERROR UNDO, THROW. That is a big problem, when exists no convention in team, which errorhandling and how will be used.

How do you write good highly useful general purpose libraries?

I asked this question about Microsoft .NET Libraries and the complexity of its source code. From what I'm reading, writing general purpose libraries and writing applications can be two different things. When writing libraries, you have to think about the client who could literally be everyone (supposing I release the library for use in the general public).
What kind of practices or theories or techniques are useful when learning to write libraries? Where do you learn to write code like the one in the .NET library? This looks like a "black art" which I don't know too much about.
That's a pretty subjective question, but here's on objective answer. The Framework Design Guidelines book (be sure to get the 2nd edition) is a very good book about how to write effective class libraries. The content is very good and the often dissenting annotations are thought-provoking. Every shop should have a copy of this book available.
You definitely need to watch Josh Bloch in his presentation How to Design a Good API & Why it Matters (1h 9m long). He is a Java guru but library design and object orientation are universal.
One piece of advice often ignored by library authors is to internalize costs. If something is hard to do, the library should do it. Too often I've seen the authors of a library push something hard onto the consumers of the API rather than solving it themselves. Instead, look for the hardest things and make sure the library does them or at least makes them very easy.
I will be paraphrasing from Effective C++ by Scott Meyers, which I have found to be the best advice I got:
Adhere to the principle of least astonishment: strive to provide classes whose operators and functions have a natural syntax and an intuitive semantics. Preserve consistency with the behavior of the built-in types: when in doubt, do as the ints do.
Recognize that anything somebody can do, they will do. They'll throw exceptions, they'll assign objects to themselves, they'll use objects before giving them values, they'll give objects values and never use them, they'll give them huge values, they'll give them tiny values, they'll give them null values. In general, if it will compile, somebody will do it. As a result, make your classes easy to use correctly and hard to use incorrectly. Accept that clients will make mistakes, and design your classes so you can prevent, detect, or correct such errors.
Strive for portable code. It's not much harder to write portable programs than to write unportable ones, and only rarely will the difference in performance be significant enough to justify unportable constructs.
Even programs designed for custom hardware often end up being ported, because stock hardware generally achieves an equivalent level of performance within a few years. Writing portable code allows you to switch platforms easily, to enlarge your client base, and to brag about supporting open systems. It also makes it easier to recover if you bet wrong in the operating system sweepstakes.
Design your code so that when changes are necessary, the impact is localized. Encapsulate as much as you can; make implementation details private.
Edit: I just noticed I very nearly duplicated what cherouvim had posted; sorry about that! But turns out we're linking to different speeches by Bloch, even if the subject is exactly the same. (cherouvim linked to a December 2005 talk, I to January 2007 one.) Well, I'll leave this answer here — you're probably best off by watching both and seeing how his message and way of presenting it has evolved :)
FWIW, I'd like to point to this Google Tech Talk by Joshua Bloch, who is a greatly respected guy in the Java world, and someone who has given speeches and written extensively on API design. (Oh, and designed some exceptionally good general purpose libraries, like the Java Collections Framework!)
Joshua Bloch, Google Tech Talks, January 24, 2007:
"How To Design A Good API and Why it
Matters" (the video is about 1 hour long)
You can also read many of the same ideas in his article Bumper-Sticker API Design (but I still recommend watching the presentation!)
(Seeing you come from the .NET side, I hope you don't let his Java background get in the way too much :-) This really is not Java-specific for the most part.)
Edit: Here's another 1½ minute bit of wisdom by Josh Bloch on why writing libraries is hard, and why it's still worth putting effort in it (economies of scale) — in a response to a question wondering, basically, "how hard can it be". (Part of a presentation about the Google Collections library, which is also totally worth watching, but more Java-centric.)
Krzysztof Cwalina's blog is a good starting place. His book, Framework Design Guidelines: Conventions, Idioms, and Patterns for Reusable .NET Libraries, is probably the definitive work for .NET library design best practices.
http://blogs.msdn.com/kcwalina/
The number one rule is to treat API design just like UI design: gather information about how your users really use your UI/API, what they find helpful and what gets in their way. Use that information to improve the design. Start with users who can put up with API churn and gradually stabilize the API as it matures.
I wrote a few notes about what I've learned about API design here: http://www.natpryce.com/articles/000732.html
I'd start looking more into design patterns. You'll probably not going to find much use for some of them, but as you get deeper into your library design the patterns will become more applicable. I'd also pick up a copy of NDepend - a great code measuring utility which may help you decouple things better. You can use .NET libraries as an example, but, personally, i don't find them to be great design examples mostly due to their complexities. Also, start looking at some open source projects to see how they're layered and structured.
A couple of separate points:
The .NET Framework isn't a class library. It's a Framework. It's a set of types meant to not only provide functionality, but to be extended by your own code. For instance, it does provide you with the Stream abstract class, and with concrete implementations like the NetworkStream class, but it also provides you the WebRequest class and the means to extend it, so that WebRequest.Create("myschema://host/more") can produce an instance of your own class deriving from WebRequest, which can have its own GetResponse method returning its own class derived from WebResponse, such that calling GetResponseStream will return your own class derived from Stream!
And your callers will not need to know this is going on behind the scenes!
A separate point is that for most developers, creating a reusable library is not, and should not be the goal. The goal should be to write the code necessary to meet requirements. In the process, reusable code may be found. In that case, it should be refactored out into a separate library, where it can be reused in the future.
I go further than that (when permitted). I will usually wait until I find two pieces of code that actually do the same thing, or which overlap. Presumably both pieces of code have passed all their unit tests. I will then factor out the common code into a separate class library and run all the unit tests again. Assuming that they still pass, I've begun the creation of some reusable code that works (since the unit tests still pass).
This is in contrast to a lesson I learned in school, when the result of an entire project was a beautiful reusable library - with no code to reuse it.
(Of course, I'm sure it would have worked if any code had used it...)

Agile practices to avoid deprecated code? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
I am converting an open source Java library to C#, which has a number of methods and classes tagged as deprecated. This project is an opportunity to start with a clean slate, so I plan to remove them entirely. However, being new to working on larger projects, I am nervous that the situation will arise again. Since much of agile development revolves around making something work now and refactoring later if needed, it seems like deprecation of APIs must be a common problem. Are there preventative measures I can take to avoid/minimize API deprecation, even if I am not entirely sure of the future direction of a project?
I'm not sure there is much you can do. Requirements change, and if you absolutely have to make sure that clients of the API are not broken by newer API version, you'll have rely on simply deprecating code until you think that no-one is using the deprecated code.
Placing [Obsolete] attributes on code causes the compiler to create warnings if there are any references to the obsolete methods. This way clients of the API, if they are diligent about fixing their compiler warnings, can gradually move to the new methods without having everything break with the new version.
Its useful if you use the ObsoleteAttribute's override which takes a string:
[Obsolete("Foo is deprecated. Use Bar instead for munging widgets.")]
<frivolous>
Perhaps you could create a TimeBombAttribute:
[TimeBomb(new DateTime(2010,1,1), "Foo will blow up! Better use Bar, or else."]
In your code, reflect for methods with the timebomb attribute and throw KaboomException if they are called after the specified date. That'll make sure that after 1st January 2010 no-one is using the obsolete methods, and you can clean up your API nicely. :)
</frivolous>
As Matt says, the Obsolete attribute is your friend... but whenever you apply it, provide details of how to change calling code. That way you've got a lot better chance of people actually changing. You might also want to consider specifying which version you anticipate removing the method in (probably the next major release).
Of course, you should be diligent in making sure you don't call the obsolete code - particularly in sample code.
Since much of agile development revolves around making something work now and refactoring later if needed
That's not agile. It's cowboy coding disguised under the label of agile.
The ideal is that whatever you complete, is complete, according to whatever Definition of Done you have. Usually the DoD states something along the lines of "feature impelmented, tested and related code refactored". Of course, if you are working on a throwaway prototype, you can have a more relaxed DoD.
API modifications are a difficult beast. If they are only project-internal APIs you are modifying, the best way to go is to refactor early. If you need to change the internal API, just go ahead and change all API clients at the same time. This way the refactoring debt does not grow very large and you don't have to use deprecation.
For published APIs you probably have some source and binary compatibility guarantees you have to maintain, at least until the next major release or so. Marking the old APIs deprecated works while maintaining compatibility. As with internal APIs, you should fix your internal code as soon as possible to not use the deprecated APIs.
Matt's answer is solid advice. I just wanted to mention that intially you probably want to use something along the lines of:
[Obsolete("Please use ... instead ", false)]
Once you have the code ported, change the false to true and the compiler will then treat all the calls to the method as an error.
Watch Josh Bloch's "How to Design a Good API and Why It Matters"
Most important w/r/t deprecation is knowing that "when in doubt, leave it out." Watch the video for clarification, but it has to do with having to support what you provide forever. If you are realistically expecting that API to be reused, you're effectively setting your decisions in stone.
I think API design is a much trickier thing to do in an Agile fashion because you're expecting it to be reused probably in many different ways. You have to worry about breaking others that are dependent on you, and so while it can be done, it's tough to have the right design emerge without getting a quick turnaround from other teams. Of course deprecation is going to help here, but I think YAGNI is a lot better design heuristic when it comes to APIs.
I think deprecation of code is an inevitable byproduct of Agile processes like continuous refactoring and incremental development. So if you end up with deprecated code as you work on your project, that's not necessarily a bad thing--just a fact of life. Of course, you will probably find that, rather than deprecating code, you end up keeping a lot of code but refactoring it into different methods, classes, and so on.
So, bottom line: I wouldn't worry about deprecating code during Agile development. If it served its purpose for a while, you're doing the right thing.
The rule of thumb for API design is to focus on what it does, rather than how it does it. Once you know the end goal, figure out the absolute minimum input you need and use that. Avoid passing your own objects as parameters, pass only data.
Seperate configuration from execution. For exmaple, maybe you have an image encoder/decoder.
Instead of making a call like:
Encoder.Encode( bytes, width, height, compression_type, compression_ratio, palette, etc etc);
Make it
Encoder.setCompressionType(compression_type);
Encoder.setCompressionType(compression_ratio);
etc,etc
Encoder.Encode(bytes, width, height);
That way adding or removing settings is much less likely to break existing implementations.
For deprecation, there's basically 3 types of APIs: internal, external, and public.
Internal is when its only your team working on the code. Deprecating these APIs isn't a big deal. Your team is the only one using it, so they aren't around long, there's pressure to change them, people aren't afraid to change them, and people know how to change them.
External is when its the same code base, but different teams are using it. This might be some common libraries in a large company, or a popular open source library. The point is, people can choose the version of code they compile with. The ease of deprecating an API depends on the size of the organization and how well they communicate. IMO, its the deprecator's job to update old code, rather than mark it deprecated and let warnings fly throughout the code base. Why the deprecator instead of the deprecatee? Because the depcarator is in the know; they know what changed and why.
Those two cases are pretty easy. So long as there is backwards compatibility, you can generally do whatever you'd like, update the clients yourself, or convince the maintainers to do it.
Then there are public api's. These are basically external API's that the clients don't have much control over, such as a web API. These are incredibly hard to update or deprecate. Most won't notice its broken, won't have someone to fix it, won't get notifications that its changing, and will only fix it once its broken (after they've yelled at you for breaking it, over course).
I've had to do the above a few times, and it is such a chore. I think the best you can do is purposefully break it early, wait a bit, and then restore it. You send out the usual warnings and deprecations first, of course, but - trust me - nothing will happen until something breaks.
An idea I've yet to try is to let people register simple apps that run small tests. When you want to do an API update, you run the external tests and contact the affected people.
Another approach to be popular is to have clients depend on (web) services. There are constructs out there that allow you to version your services and allow clients to perform lookups. This adds a lot more moving parts and complexity into the equation, but can be helpful if you are looking at turning over a lot of versions, and having to support multiple versions in production.
This article does a good job of explaining the problem and an approach.

What are the disadvantages of Aspect-Oriented Programming (AOP)?

What are the possible and critical disadvantages of Aspect-Oriented Programming?
For example: cryptic debugging for newbies (readability impact)
I think the biggest problem is that nobody knows how to define the semantics of an aspect, or how to declare join points non-procedurally.
If you can't define what an aspect does independently of the context in which it will be embedded, or define the effects that it has in such a way that it doesn't damage the context in which it is embedded, you (and tools) can't reason about what it does reliably. (You'll note the most common example of aspects is "logging", which is defined as "write some stuff to a log stream the application doesn't know anything about", because this is pretty safe). This violates David Parnas' key notion of information hiding. One of the worst examples of aspects that I see are ones that insert synchronization primitives into code; this effects the sequence of possible interactions that code can have. How can you possibly know this is safe (won't deadlock? won't livelock? won't fail to protect? is recoverable in the face of a thrown exception in a synchronization partner?) unless the application only does trivial things already.
Join points are now typically defined by providing some kind of identifier wildcarding (e.g, "put this aspect into any method named "DataBaseAccess*". For this to work, the folks writing the affected methods have to signal the intention for their code to be aspectized by naming their code in a funny way; that's hardly modular. Worse, why should the victim of an aspect even have to know that it exists? And consider what happens if you simply rename some methods; the aspect is no longer injected where it is needed, and your application breaks. What is needed are join point specifications which are intentional; somehow, the aspect has to know where it is needed without the programmers placing a neon sign at each usage point. (AspectJ has some control-flow related join points which seem a bit better in this regard).
So aspects are kind of interesting idea, but I think they are technologically immature. And that immaturity makes their usage fragile. And that's where the problem lies.
(I'm a big fan of automated software engineering tools [see my bio] just not this way).
Poor toolchain support - debuggers, profilers etc may not know about the AOP and so may work on code as if all the aspects had been replaced by procedural code
Code bloat - small source can lead to much larger object code as code is "weaved" throughout the code base
I think the biggest disadvantage is using AOP well. People use it in places where it doesn't make sense, for example, and will not use it where it does.
For example, a factory pattern is obviously something that AOP can do better, as DI can also do it well, but the observer pattern is simpler when using AOP, as is the strategy pattern.
It will be harder to unit test, esp if you do the weaving at runtime.
If weaving at runtime then you also take a performance hit.
Having a good way to model what is going on when mixing AOP with classes is a problem, as UML I don't think is a good model at that point.
Unless you are using Eclipse then tools do have problems, but, with Eclipse and AJDT AOP is much easier.
We still use junit and nunit for example, and so have to modify our code to allow unit tests to run, when using priviledged mode AOP could do better unit tests by testing private methods also, and we don't have to change our programs just to make them work with unit-testing. This is another example of not really understanding how AOP can be helpful, we are still chained in many ways to the past with unit-test frameworks and current design pattern implementations and don't see how AOP could help us do better coding.
Maintenance and Debugging. With aop, you suddenly have code that is being run at a given point ( method entry, exit, whatever ) but in just looking at the code, you have no clue that it's even getting called, especially if the aop configuration is in another file, like xml config. If the advice causes some changes, then while debugging an application, things may look strange with no explanation. This doesn't affect only newbies.
I would not call it a critical disadvantage, but the biggest problem I have seen is one of developer experience and ability to adapt. Not all developers yet understand the difference between declarative and imperative programming.
We make use of the policy injection application block in EntLib 4.1 pretty extensively as well as Unity for DI and it's just not something that sinks in quickly for some people. I find my self explaining over and over again to the same people why the application is not behaving in the way they expect. It generally starts off them explaining something and me saying "see that declaration above the method." :) Some people get it right away, love it and become extremely productive -- others struggle.
Learning curve is not unique to AOP, but it seems to have a higher learning curve that other things that your average developer encounters.
With regard to the maintenance/debugging argument, aspect-oriented programming tends to go hand-in-hand with all the other aspects of agile software-development practices.
These practices tend to remove debugging from the picture, replacing it with unit testing and test-driven development.
In addition, it may be much easier to maintain a small, clear code footprint with advice than a large, incomprehensible code footprint without advice (the advice being the thing that transforms a large, incomprehensible code footprint into a small, clear code footprint).
Because the power of AOP, if there is a bug your crosscutting, it can cause to widespread problems. On the otherhand, someone might change the join points in a program – e.g., by renaming or moving methods – in ways that the aspect writer did not expect, with unintended consequences. One advantage of modularizing crosscutting concerns is enabling one programmer to affect the entire system easily.

Practices for programming in a scientific environment? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
Background
Last year, I did an internship in a physics research group at a university. In this group, we mostly used LabVIEW to write programs for controlling our setups, doing data acquisition and analyzing our data. For the first two purposes, that works quite OK, but for data analysis, it's a real pain. On top of that, everyone was mostly self-taught, so code that was written was generally quite a mess (no wonder that every PhD quickly decided to rewrite everything from scratch). Version control was unknown, and impossible to set up because of strict software and network regulations from the IT department.
Now, things actually worked out surprisingly OK, but how do people in the natural sciences do their software development?
Questions
Some concrete questions:
What languages/environments have you used for developing scientific software, especially data analysis? What libraries? (for example, what do you use for plotting?)
Was there any training for people without any significant background in programming?
Did you have anything like version control, and bug tracking?
How would you go about trying to create a decent environment for programming, without getting too much in the way of the individual scientists (especially physicists are stubborn people!)
Summary of answers thus far
The answers (or my interpretation of them) thus far: (2008-10-11)
Languages/packages that seem to be the most widely used:
LabVIEW
Python
with SciPy, NumPy, PyLab, etc. (See also Brandon's reply for downloads and links)
C/C++
MATLAB
Version control is used by nearly all respondents; bug tracking and other processes are much less common.
The Software Carpentry course is a good way to teach programming and development techniques to scientists.
How to improve things?
Don't force people to follow strict protocols.
Set up an environment yourself, and show the benefits to others. Help them to start working with version control, bug tracking, etc. themselves.
Reviewing other people's code can help, but be aware that not everyone may appreciate that.
What languages/environments have you used for developing scientific software, esp. data analysis? What libraries? (E.g., what do you use for plotting?)
I used to work for Enthought, the primary corporate sponsor of SciPy. We collaborated with scientists from the companies that contracted Enthought for custom software development. Python/SciPy seemed to be a comfortable environment for scientists. It's much less intimidating to get started with than say C++ or Java if you're a scientist without a software background.
The Enthought Python Distribution comes with all the scientific computing libraries including analysis, plotting, 3D visualation, etc.
Was there any training for people without any significant background in programming?
Enthought does offer SciPy training and the SciPy community is pretty good about answering questions on the mailing lists.
Did you have anything like version control, bug tracking?
Yes, and yes (Subversion and Trac). Since we were working collaboratively with the scientists (and typically remotely from them), version control and bug tracking were essential. It took some coaching to get some scientists to internalize the benefits of version control.
How would you go about trying to create a decent environment for programming, without getting too much in the way of the individual scientists (esp. physicists are stubborn people!)
Make sure they are familiarized with the tool chain. It takes an investment up front, but it will make them feel less inclined to reject it in favor of something more familiar (Excel). When the tools fail them (and they will), make sure they have a place to go for help — mailing lists, user groups, other scientists and software developers in the organization. The more help there is to get them back to doing physics the better.
The course Software Carpentry is aimed specifically at people doing scientific computing and aims to teach the basics and lessons of software engineering, and how best to apply them to projects.
It covers topics like version control, debugging, testing, scripting and various other issues.
I've listened to about 8 or 9 of the lectures and think it is to be highly recommended.
Edit: The MP3s of the lectures are available as well.
Nuclear/particle physics here.
Major programing work used to be done mostly in Fortran using CERNLIB (PAW, MINUIT, ...) and GEANT3, recently it has mostly been done in C++ with ROOT and Geant4. There are a number of other libraries and tools in specialized use, and LabVIEW sees some use here and there.
Data acquisition in my end of this business has often meant fairly low level work. Often in C, sometimes even in assembly, but this is dying out as the hardware gets more capable. On the other hand, many of the boards are now built with FPGAs which need gate twiddling...
One-offs, graphical interfaces, etc. use almost anything (Tcl/Tk used to be big, and I've been seeing more Perl/Tk and Python/Tk lately) including a number of packages that exist mostly inside the particle physics community.
Many people writing code have little or no formal training, and process is transmitted very unevenly by oral tradition, but most of the software group leaders take process seriously and read as much as necessary to make up their deficiencies in this area.
Version control for the main tools is ubiquitous. But many individual programmers neglect it for their smaller tasks. Formal bug tracking tools are less common, as are nightly builds, unit testing, and regression tests.
To improve things:
Get on the good side of the local software leaders
Implement the process you want to use in your own area, and encourage those you let in to use it too.
Wait. Physicists are empirical people. If it helps, they will (eventually!) notice.
One more suggestion for improving things.
Put a little time in to helping anyone you work directly with. Review their code. Tell them about algorithmic complexity/code generation/DRY or whatever basic thing they never learned because some professor threw a Fortran book at them once and said "make it work". Indoctrinate them on process issues. They are smart people, and they will learn if you give them a chance.
This might be slightly tangential, but hopefully relevant.
I used to work for National Instruments, R&D, where I wrote software for NI RF & Communication toolkits. We used LabVIEW quite a bit, and here are the practices we followed:
Source control. NI uses Perforce. We did the regular thing - dev/trunk branches, continuous integration, the works.
We wrote automated test suites.
We had a few people who came in with a background in signal processing and communication. We used to have regular code reviews, and best practices documents to make sure their code was up to the mark.
Despite the code reviews, there were a few occasions when "software guys", like me had to rewrite some of this code for efficiency.
I know exactly what you mean about stubborn people! We had folks who used to think that pointing out a potential performance improvement in their code was a direct personal insult! It goes without saying that that this calls for good management. I thought the best way to deal with these folks is to go slowly, not press to hard for changes and if necessary be prepared to do the dirty work. [Example: write a test suite for their code].
I'm not exactly a 'natural' scientist (I study transportation) but am an academic who writes a lot of my own software for data analysis. I try to write as much as I can in Python, but sometimes I'm forced to use other languages when I'm working on extending or customizing an existing software tool. There is very little programming training in my field. Most folks are either self-taught, or learned their programming skills from classes taken previously or outside the discipline.
I'm a big fan of version control. I used Vault running on my home server for all the code for my dissertation. Right now I'm trying to get the department to set up a Subversion server, but my guess is I will be the only one who uses it, at least at first. I've played around a bit with FogBugs, but unlike version control, I don't think that's nearly as useful for a one-man team.
As for encouraging others to use version control and the like, that's really the problem I'm facing now. I'm planning on forcing my grad students to use it on research projects they're doing for me, and encouraging them to use it for their own research. If I teach a class involving programming, I'll probably force the students to use version control there too (grading them on what's in the repository). As far as my colleagues and their grad students go, all I can really do is make a server available and rely on gentle persuasion and setting a good example. Frankly, at this point I think it's more important to get them doing regular backups than get them on source control (some folks are carrying around the only copy of their research data on USB flash drives).
1.) Scripting languages are popular these days for most things due to better hardware. Perl/Python/Lisp are prevalent for lightweight applications (automation, light computation); I see a lot of Perl at my work (computational EM) since we like Unix/Linux. For performance stuff, C/C++/Fortran are typically used. For parallel computing, well, we usually manually parallelize runs in EM as opposed to having a program implicitly do it (ie split up the jobs by look angle when computing radar cross sections).
2.) We just kind of throw people into the mix here. A lot of the code we have is very messy, but scientists are typically a scatterbrained bunch that don't mind that sort of thing. Not ideal, but we have things to deliver and we're severely understaffed. We're slowly getting better.
3.) We use SVN; however, we do not have bug tracking software. About as good as it gets for us is a txt file that tells you where bugs specific bugs are.
4.) My suggestion for implementing best practices for scientists: do it slowly. As scientists, we typically don't ship products. No one in science makes a name for himself by having clean, maintainable code. They get recognition from the results of that code, typically. They need to see justification for spending time on learning software practices. Slowly introduce new concepts and try to get them to follow; they're scientists, so after their own empirical evidence confirms the usefulness of things like version control, they will begin to use it all the time!
I'd highly recommend reading "What Every Computer Scientist Should Know About Floating-Point Arithmetic". A lot of problems I encounter on a regular basis come from issues with floating point programming.
I am a physicist working in the field of condensed matter physics, building classical and quantum models.
Languages:
C++ -- very versatile: can be used for anything, good speed, but it can be a bit inconvenient when it comes to MPI
Octave -- good for some supplementary calculations, very convenient and productive
Libraries:
Armadillo/Blitz++ -- fast array/matrix/cube abstractions for C++
Eigen/Armadillo -- linear algebra
GSL -- to use with C
LAPACK/BLAS/ATLAS -- extremely big and fast, but less convenient (and written in FORTRAN)
Graphics:
GNUPlot -- it has very clean and neat output, but not that productive sometimes
Origin -- very convenient for plotting
Development tools:
Vim + plugins -- it works great for me
GDB -- a great debugging tool when working with C/C++
Code::Blocks -- I used it for some time and found it quite comfortable, but Vim is still better in my opinion.
I work as a physicist in a UK university.
Perhaps I should emphasise that different areas of research have different emphasis on programming. Particle physicists (like dmckee) do computational modelling almost exclusively and may collaborate on large software projects, whereas people in fields like my own (condensed matter) write code relatively infrequently. I suspect most scientists fall into the latter camp. I would say coding skills are usually seen as useful in physics, but not essential, much like physics/maths skills are seen as useful for programmers but not essential. With this in mind...
What languages/environments have you used for developing scientific software, esp. data analysis? What libraries? (E.g., what do you use for plotting?)
Commonly data analysis and plotting is done using generic data analysis packages such as IGOR Pro, ORIGIN, Kaleidegraph which can be thought of as 'Excel plus'. These packages typically have a scripting language that can be used to automate. More specialist analysis may have a dedicated utility for the job that generally will have been written a long time ago, no-one has the source for and is pretty buggy. Some more techie types might use the languages that have been mentioned (Python, R, MatLab with Gnuplot for plotting).
Control software is commonly done in LabVIEW, although we actually use Delphi which is somewhat unusual.
Was there any training for people without any significant background in programming?
I've been to seminars on grid computing, 3D visualisation, learning Boost etc. given by both universities I've been at. As an undergraduate we were taught VBA for Excel and MatLab but C/MatLab/LabVIEW is more common.
Did you have anything like version control, bug tracking?
No, although people do have personal development setups. Our code base is in a shared folder on a 'server' which is kept current with a synching tool.
How would you go about trying to create a decent environment for programming, without getting too much in the way of the individual scientists (esp. physicists are stubborn people!)
One step at a time! I am trying to replace the shared folder with something a bit more solid, perhaps finding a SVN client which mimics the current synching tools behaviour would help.
I'd say though on the whole, for most natural science projects, time is generally better spent doing research!
Ex-academic physicist and now industrial physicist UK here:
What languages/environments have you used for developing scientific software, esp. data analysis? What libraries? (E.g., what do you use for plotting?)
I mainly use MATLAB these days (easy to access visualisation functions and maths). I used to use Fortran a lot and IDL. I have used C (but I'm more a reader than a writer of C), Excel macros (ugly and confusing). I'm currently needing to be able to read Java and C++ (but I can't really program in them) and I've hacked Python as well. For my own entertainment I'm now doing some programming in C# (mainly to get portability / low cost / pretty interfaces). I can write Fortran with pretty much any language I'm presented with ;-)
Was there any training for people without any significant background in programming?
Most (all?) undergraduate physics course will have a small programming course usually on C, Fortran or MATLAB but it's the real basics. I'd really like to have had some training in software engineering at some point (revision control / testing / designing medium scale systems)
Did you have anything like version control, bug tracking?
I started using Subversion / TortoiseSVN relatively recently. Groups I've worked with in the past have used revision control. I don't know any academic group which uses formal bug tracking software. I still don't use any sort of systematic testing.
How would you go about trying to create a decent environment for programming, without getting too much in the way of the individual scientists (esp. physicists are stubborn people!)
I would try to introduce some software engineering ideas at undergraduate level and then reinforce them by practice at graduate level, also provide pointers to resources like the Software Carpentry course mentioned above.
I'd expect that a significant fraction of academic physicists will be writing software (not necessarily all though) and they are in dire need of at least an introduction to ideas in software engineering.
What languages/environments have you used for developing scientific software, esp. data analysis? What libraries? (E.g., what do you use for plotting?)
Python, NumPy and pylab (plotting).
Was there any training for people without any significant background in programming?
No, but I was working in a multimedia research lab, so almost everybody had a computer science background.
Did you have anything like version control, bug tracking?
Yes, Subversion for version control, Trac for bug tracing and wiki. You can get free bug tracker/version control hosting from http://www.assembla.com/ if their TOS fits your project.
How would you go about trying to create a decent environment for programming, without getting too much in the way of the individual scientists (esp. physicists are stubborn people!).
Make sure the infrastructure is set up and well maintained and try to sell the benefits of source control.
I'm a statistician at a university in the UK. Generally people here use R for data analysis, it's fairly easy to learn if you know C/Perl. Its real power is in the way you can import and modify data interactively. It's very easy to take a number of say CSV (or Excel) files and merge them, create new columns based on others and then throw that into a GLM, GAM or some other model. Plotting is trivial too and doesn't require knowledge of a whole new language (like PGPLOT or GNUPLOT.) Of course, you also have the advantage of having a bunch of built-in features (from simple things like mean, standard deviation etc all the way to neural networks, splines and GL plotting.)
Having said this, there are a couple of issues. With very large datasets R can become very slow (I've only really seen this with >50,000x30 datasets) and since it's interpreted you don't get the advantage of Fortran/C in this respect. But, you can (very easily) get R to call C and Fortran shared libraries (either from something like netlib or ones you've written yourself.) So, a usual workflow would be to:
Work out what to do.
Prototype the code in R.
Run some preliminary analyses.
Re-write the slow code into C or Fortran and call that from R.
Which works very well for me.
I'm one of the only people in my department (of >100 people) using version control (in my case using git with githuib.com.) This is rather worrying, but they just don't seem to be keen on trying it out and are content with passing zip files around (yuck.)
My suggestion would be to continue using LabView for the acquisition (and perhaps trying to get your co-workers to agree on a toolset for acquisition and making is available for all) and then move to exporting the data into a CSV (or similar) and doing the analysis in R. There's really very little point in re-inventing the wheel in this respect.
What languages/environments have you used for developing scientific software, esp. data analysis? What libraries? (E.g., what do you use for plotting?)
My undergraduate physics department taught LabVIEW classes and used it extensively in its research projects.
The other alternative is MATLAB, in which I have no experience. There are camps for either product; each has its own advantages/disadvantages. Depending on what kind of problems you need to solve, one package may be more preferable than the other.
Regarding data analysis, you can use whatever kind of number cruncher you want. Ideally, you can do the hard calculations in language X and format the output to plot nicely in Excel, Mathcad, Mathematica, or whatever the flavor du jour plotting system is. Don't expect standardization here.
Did you have anything like version control, bug tracking?
Looking back, we didn't, and it would have been easier for us all if we did. Nothing like breaking everything and struggling for hours to fix it!
Definitely use source control for any common code. Encourage individuals to write their code in a manner that could be made more generic. This is really just coding best practices. Really, you should have them teaching (or taking) a computer science class so they can get the basics.
How would you go about trying to create a decent environment for programming, without getting too much in the way of the individual scientists (esp. physicists are stubborn people!)
There is a clear split between data aquisition (DAQ) and data analysis. Meaning, it's possible to standardize on the DAQ and then allow the scientists to play with the data in the program of their choice.
Another good option is Scilab. It has graphic modules à la LabVIEW, it has its own programming language and you can also embed Fortran and C code, for example. It's being used in public and private sectors, including big industrial companies. And it's free.
About versioning, some prefer Mercurial, as it gives more liberties managing and defining the repositories. I have no experience with it, however.
For plotting I use Matplotlib. I will soon have to make animations, and I've seen good results using MEncoder. Here is an example including an audio track.
Finally, I suggest going modular, this is, trying to keep main pieces of code in different files, so code revision, understanding, maintenance and improvement will be easier. I have written, for example, a Python module for file integrity testing, another for image processing sequences, etc.
You should also consider developing with the use a debugger that allows you to check variable contents at settable breakpoints in the code, instead using print lines.
I have used Eclipse for Python and Fortran developing (although I got a false bug compiling a Fortran short program with it, but it may have been a bad configuration) and I'm starting to use the Eric IDE for Python. It allows you to debug, manage versioning with SVN, it has an embedded console, it can do refactoring with Bicycle Repair Man (it can use another one, too), you have Unittest, etc. A lighter alternative for Python is IDLE, included with Python since version 2.3.
As a few hints, I also suggest:
Not using single-character variables. When you want to search appearances, you will get results everywhere. Some argue that a decent IDE makes this easier, but then you will depend on having permanent access to the IDE. Even using ii, jj and kk can be enough, although this choice will depend on your language. (Double vowels would be less useful if code comments are made in Estonian, for instance).
Commenting the code from the very beginning.
For critical applications sometimes it's better to rely on older language/compiler versions (major releases), more stable and better debugged.
Of course you can have more optimized code in later versions, fixed bugs, etc, but I'm talking about using Fortran 95 instead of 2003, Python 2.5.4 instead of 3.0, or so. (Specially when a new version breaks backwards compatibility.) Lots of improvements usually introduce lots of bugs. Still, this will depend on specific application cases!
Note that this is a personal choice, many people could argue against this.
Use redundant and automated backup! (With versioning control).
Definitely, use Subversion to keep current, work-in-progress, and stable snapshot copies of source code. This includes C++, Java etc. for homegrown software tools, and quickie scripts for one-off processing.
With the strong leaning in science and applied engineering toward "lone cowboy" development methodology, the usual practice of organizing the repository into trunk, tag and whatever else it was - don't bother! Scientists and their lab technicians like to twirl knobs, wiggle electrodes and chase vacuum leaks. It's enough of a job to get everyone to agree to, say Python/NumPy or follow some naming convention; forget trying to make them follow arcane software developer practices and conventions.
For source code management, centralized systems such as Subversion are superior for scientific use due to the clear single point of truth (SPOT). Logging of changes and ability to recall versions of any file, without having chase down where to find something, has huge record-keeping advantages. Tools like Git and Monotone: oh my gosh the chaos I can imagine that would follow! Having clear-cut records of just what version of hack-job scripts were used while toying with the new sensor when that Higgs boson went by or that supernova blew up, will lead to happiness.
What languages/environments have you
used for developing scientific
software, esp. data analysis? What
libraries? (E.g., what do you use for
plotting?)
Languages I have used for numerics and sicentific-related stuff:
C (slow development, too much debugging, almost impossible to write reusable code)
C++ (and I learned to hate it -- development isn't as slow as C, but can be a pain. Templates and classes were cool initially, but after a while I realized that I was fighting them all the time and finding workarounds for language design problems
Common Lisp, which was OK, but not widely used fo Sci computing. Not easy to integrate with C (if compared to other languages), but works
Scheme. This one became my personal choice.
My editor is Emacs, although I do use vim for quick stuff like editing configuration files.
For plotting, I usually generate a text file and feed it into gnuplot.
For data analysis, I usually generate a text file and use GNU R.
I see lots of people here using FORTRAN (mostly 77, but some 90), lots of Java and some Python. I don't like those, so I don't use them.
Was there any training for people
without any significant background in
programming?
I think this doesn't apply to me, since I graduated in CS -- but where I work there is no formal training, but people (Engineers, Physicists, Mathematicians) do help each other.
Did you have anything like version
control, bug tracking?
Version control is absolutely important! I keep my code and data in three different machines, in two different sides of the world -- in Git repositories. I sync them all the time (so I have version control and backups!) I don't do bug control, although I may start doing that.
But my colleagues don't BTS or VCS at all.
How would you go about trying to
create a decent environment for
programming, without getting too much
in the way of the individual
scientists (esp. physicists are
stubborn people!)
First, I'd give them as much freedom as possible. (In the University where I work I could chooe between having someone install Ubuntu or Windows, or install my own OS -- I chose to install my own. I don't have support from them and I'm responsible for anything that happens with my machins, including security issues, but I do whatever I want with the machine).
Second, I'd see what they are used to, and make it work (need FORTRAN? We'll set it up. Need C++? No problem. Mathematica? OK, we'll buy a license). Then see how many of them would like to learn "additional tools" to help them be more productive (don't say "different" tools. Say "additional", so it won't seem like anyone will "lose" or "let go" or whatever). Start with editors, see if there are groups who would like to use VCS to sync their work (hey, you can stay home and send your code through SVN or GIT -- wouldn't that be great?) and so on.
Don't impose -- show examples of how cool these tools are. Make data analysis using R, and show them how easy it was. Show nice graphics, and explain how you've created them (but start with simple examples, so you can quickly explain them).
I would suggest F# as a potential candidate for performing science-related manipulations given its strong semantic ties to mathematical constructs.
Also, its support for units-of-measure, as written about here makes a lot of sense for ensuring proper translation between mathematical model and implementation source code.
First of all, I would definitely go with a scripting language to avoid having to explain a lot of extra things (for example manual memory management is - mostly - ok if you are writing low-level, performance sensitive stuff, but for somebody who just wants to use a computer as an upgraded scientific calculator it's definitely overkill). Also, look around if there is something specific for your domain (as is R for statistics). This has the advantage of already working with the concepts the users are familiar with and having specialized code for specific situations (for example calculating standard deviations, applying statistical tests, etc in the case of R).
If you wish to use a more generic scripting language, I would go with Python. Two things it has going for it are:
The interactive shell where you can experiment
Its clear (although sometimes lengthy) syntax
As an added advantage, it has libraries for most of the things you would want to do with it.
I'm no expert in this area, but I've always understood that this is what MATLAB was created for. There is a way to integrate MATLAB with SVN for source control as well.