Transition to OO ABAP? - oop

It appears that most of our SAP programmers are using the old version of ABAP, the one before object-oriented stuff. I also noticed that the language is much cleaner and more modern with OO (they apparently took the opportunity to get rid of deprecated things).
As the system is not rolled out yet, the time to make any redesign is now rather than later.
It is worth requesting that new code be written as OO ABAP programs? How to sell that to management? Does the interface with non-OO programs work well?
(updated to note that I'm talking specifically about the new code, especially planned for the next year)

If it's working in production, don't rewrite your code. Not worth the time or money, and no management (at a company big enough to be running SAP) would agree to it.
Unless you move to a green field environment, you're never going to get everything re-written in OO. SAP hasn't even done that with their core ECC modules. Expecting to be able to do it with your custom stuff is unrealistic.
I would just read up on OO ABAP and start writing new programs with it.
OO ABAP and procedural ABAP work together just fine. You can call classes & methods from procedural programs and (more limited, but) vice versa.

We develop a lot of new, fresh, ABAP code to our customers, and the use of ABAP OO is growing slowly, but still growing.
It's easier to convince new developers to work with ABAP OO, because there is a lot less to learn. Also, writing code using OO ABAP enables the correct use of design patterns, highly effective unit testing, UI abstraction (eg SAPgui and WebDynpro or SAP Console), and reduces documentation a lot.
Also, as some people said before, SAP isn't rewriting their codebase to ABAP OO. But they surely given a try by rewriting ME51N from ME51, ME21N from ME21 and SBWP from SO01.
Also, all the new APIs from SAP, like ABAP Unit, ABAP Proxy, the new ALV, WebDynpro for ABAP and the all-new Enhancement and Switch Framework are good examples (I think) on why you should give some attention to it.

It depends on the size of the programs to be written. If it is a large "system" without too much database interaction, there may be some benefits. For smaller programs, I don't see any advantages of "objectifying" the code.
It also depends on the skill and preferences of the developers. If they want to "OO", there may be better environments. If they're stuck in their "old procedural" ways of thinking, there may be other ways to improve the code than switching to OO.
One example I'm commonly seeing is the discussion of "what should the database do" (e.g. joins, sorting, grouping) vs. "what should I do in code".

I just found a copy of the white paper Esti mentioned at the SAP SDN: Not Yet Using ABAP Objects? Eight Reasons Why Every ABAP Developer Should Give It a Second Look
This paper gives a brief insight in the benefits of using ABAP OO.

Try finding a copy of the white paper:
Not Yet Using ABAP Objects? Eight Reasons Why Every ABAP Developer Should Give It a Second Look by Horst Keller and Gerd Kluger.
Some of the biggest advantages for SAP OO, particularly with new SAP developers is that it forces you to be a lot more explicit that procedural ABAP. It makes written code a lot more maintainable and may feel more familiar to programmers coming from a more mainstream background.

Old, classic reports often contain redundant codings. Different reports have been built with much "copy and paste". Try to find which things are redundant and then pull them - step by step - out of these reports into new global, reusable, well-designed classes and make the old reports more tight by replacing existing code with "call method"s to central definied and well-tested OO-logic.

Related

Pros and Cons of using object oriented programming for progress openedge

I understand the pros and cons of using object oriented programming as a concept. What I'm looking for are the pros and cons of using oo in progress/openedge specifically. Are there challenges that I need to take into account? Are there parts of the language that don't mesh well with oo? Stuff like that.
Edit: using 10.2b
I'll give you my opinion, but be forewarned that I'm probably the biggest Progress hater out there. ;) That said, I have written several medium-sized projects in OOABL so I have some experience in the area. These are some things I wrote, just so you know I'm not talking out of my hat:
STOMP protocol framework for clients and servers
a simple ORM mimicking ActiveRecord
an ABL compiler interface for the organization I was at (backend and frontend)
a library for building up Excel and Word documents (serializing them using the MS Office 2003 XML schemas; none of that silly COM stuff)
an email client that could send emails using multiple strategies
OOABL Pros:
If you absolutely must write Progress code, it is a great option for creating reusable code.
Great way to clean up an existing procedural codebase
OOABL Cons:
Class hierarchies are limited; you can’t create inherited (sub-)
interfaces in 10.2B (I think this was going to be added in 11). Older
versions of OpenEdge have other limitations like lack of abstract
classes. This limits your ability to create clean OO design and will
hurt you when you start building non-trivial things.
Error handling sucks - CATCH/THROW doesn’t let you throw your custom
errors and force callers to catch them. Backwards compatibility
prevents this from evolving further so I doubt it will ever improve.
Object memory footprint is large, and there are no AVM debugging
tools to track down why (gotta love these closed systems!)
Garbage collection wasn’t existent ‘til 10.2A, and still
has some bugs even in 11 (see official OE forum for some examples)
Network programming (with sockets) is a PITA - you have to run a
separate persistent procedure to manage the socket. I think evented
programming in OOABL was a PITA in general; I remember getting a lot
of errors about “windowed environments” or something to that effect
when trying to use them. PUBLISH/SUBSCRIBE didn’t work either,
if memory serves.
Depending on your environment, code reviews may be difficult as most
Progress developers don’t do OOABL so may not understand your code
If above point is true, you may face active resistance from
entrenched developers who feel threatened by having to learn new
things
OO is all about building small, reusable pieces that can be combined to make a greater whole. A big problem with OOABL is that the “ABL” part drags you down with its coarse data structures and lack of enumerators, which prevent you from really being able to build truly beautiful things with it. Unfortunately, since it is a closed language you can’t just sidestep the hand you’re dealt and create your own new data or control structures for it externally.
Now, it is theoretically possible to try and build some of these things using MEMPTRs, fixed arrays (EXTENT), and maybe WORK-TABLEs. However, I had attempted this in 10.1C and the design fell apart due to the lack of interface inheritance and abstract classes, and as I expected, performance was quite bad. The latter part may just be due to my poor ability, but I suspect it's an implementation limitation that would be nigh impossible to surmount.
The bottom line is use OOABL if you absolutely must be coding in OpenEdge - it’s better than procedural ABL and the rough edges get slightly smoother after each iterative release of OpenEdge. However, it will never be a beautiful language (OO or otherwise).
If you want to learn proper object-oriented programming and aren’t constricted to ABL, I would highly recommend looking at a language that treats objects as a first-class citizen such as Ruby or Smalltalk.
In the last four years I have worked 80% of the time with OOABL (started with 10.1c).
I definitely recommend using OOABL but I think it is very important to consider that using OOABL the same way as in other OO languages is fraught with problems.
With "the same way" I mean design patterns and implementation practices that are common in the oo world. Also some types of applications, especially in the area of technical frameworks are hard to do with OpenEdge (e.g. ORM).
The causes are performance problems with OOABL and missing OO features in the language.
If you are programming in C# or Java for exampe, memory footprint and instantiation time of objects are not a big issue in many cases. Using ABL this becomes a big issue much more often.
This leads to other design decisions and prevents the implementation of some patterns and frameworks.
Some missing or bad OO features:
No class library, no data structures needed for oo
No package visibility as in java (internal in c#)
This becomes relevant especially in larger applications
No Generics
Really terrible Exception Handling
Very limited reflection capabilities (improved in oe11)
So if you are familiar with oo programming in other languages and start working with OOABL, you could reach a point at which you were missing a lot of things you expect to be there, and get frustrated when trying to implement such things in ABL.
If your application has to run on Windows only, it is also possible to implement new oo code in C# and call it from your existing progress code via clr bridge, which works very smoothly.
K+
Only one thing - "Error handling sucks" - it sucks, but not because you can not make your own error-classes a catch them in caller block - that works, I'm using that. What sucks is that mix from old NO-ERROR / ERROR-HANDLE option and Progress.Lang.Error / CATCH block and ROUTINE-LEVEL ON ERROR UNDO, THROW. That is a big problem, when exists no convention in team, which errorhandling and how will be used.

OO or procedural

I have an Access db I use for my checkbook (with a good amount of fairly simple VBA behind it) and I'd like to rewrite it as a stand-alone program with a SQL backend. I'm thinking of using either C++, Java, or Python. I had assumed, before I started, that I would write it OO because I thought that I would think "in OO terms" (due to a OO Logic class and a C++ class I took), but I'm finding that I can only visualize it as procedural (but maybe because I'm mentally stuck in thinking of how the db works in Access). How do I decide? Am I making sense or does it seem like I'm not understanding the concepts?
Thanks for your help.
I'd suggest OO - it's not harder than procedural programming, actually easier to maintain with the right tool. Delphi would be my choice - great DB programming support, visual designer, strongly-typed, plenty of components available. There are many great applications that are written in Delphi. Often underestimated, there are many reasons it's got a loyal following.
Now I'll duck as the Delphi-haters load up with tomatoes.
Well, OO may well be overkill, but it is excellent practice. Any code monkey can write procedural code. Its the path of least resistance in every case, which is why most people use it for one off apps that don't do much. However, if you're writing to get experience in working with OO, than it is best to think of it that way. You could start by designing an object that manages financial transaction, then you will also need a way to interact with the DB. Perhaps you could write a DB layer where you abstract away the database calls from the transaction object using the Entity framework where you could learn LINQ (or whatever the JAVA equivalent is). This is all assuming that you are doing this for fun and practice.
oo seems to be overkill for a simple checkbook app. Try something on a larger scale like something to manage all your financial accounts. This way designing an account class would make sense
Well it depends on your motivation. If you want a checkbook application as quickly as possible, just churn out the procedural code. No-one other than you will know the difference. If you want to use this application to better yourself as aprogrammer. Take the time to learn how to write in in OO.
I'd go with Python: no compiling and uses dynamic typing (you can use strict typing too if you want). Plus, it has a huge following in the open source community which means great support, tools, and documentation for free.
As for OO vs. Procedural -- all these languages you've mentioned could be written in a procedural style -- that is, one big class/method that does everything -- but you'll soon find that you'll want to follow DRY principles (Don't Repeat Yourself) and start with some private methods that do one particular thing well. From then, you'll want to group similar things into separate classes, and then from there you'll want to abstract those classes... see where I'm going here?
In my opinion you should concentrate less on the OO versus procedural thing. If you have the possibility to go procedural in the beginning, then go procedural. It's the easiest thing you can do to get you started. The OO thing, on the other hand, may just as well qualify as YAGNI (You Ain't Gonna Need It).
What you should do though, is to write tests, unit tests and then integration tests. And you should strive to write tests first. This way, even if you begin with a procedural application you may later on refactor it into a full-fledged OO application. But, only if you need objects. These tests will be you're safety net when moving around code in your application.
Trying to think your applications into object from the beginning may lead you to an point where you're stuck with your class hierarchies and architecture.
I'm not a genius, so I may be wrong, but in my experience, starting with simple functions and then thinking about grouping them into objects or modules is better than starting by saying: OK, I'll have this object that interacts with this object, which is implementing pattern X, so this way I'll decouple interface Y from implementation Z. Later on, you may observe that your domain model is weak. Take an evolutionary design path and start with small building blocks.
If you are looking for a quick app that you can extend, check out Dynamic Data.

OO ABAP: When and Why? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
Months after my company has upgraded from 4.6c to ECC6.0, our team of programmers are still coding in the traditional 4.7c way. I'm eager to try out the new OO approach of ABAP, but much to my dismay most people here only emphasize on getting things done in the shortest time frame given.
My question would be:
1) When do people in your organization actually started coding in OO ABAP?
2) Is there any significant reason that people would want to code it in an OO way? e.g. Call Method is faster than a PERFORM statement?
1) When do people in your organization actually started coding in OO ABAP?
Most developers in my organisation have learnt the classic ABAP before introduction of ABAP OO. They are mostly senior developers who restrain from learning proper OOP and OOD principles. They are still using mostly procedural ABAP features.
Furthermore, we work in a legacy environment. the basics of our backend was build during the times of 4.6C. It is hard to bring proper OO Design into legacy systems.
On the other hand, the procedural features still work. Some features like transactional database updates are mostly used from the procedural part of ABAP. You might know Update Function Modules or Subroutines exclusively for database transactions (those you can call IN UPDATE TASK). They are an integral part of the ABAP basic components. One can't deny that the procedural ABAP part is still needed.
2) Is there any significant reason that people would want to code it in an OO way? e.g. Call Method is faster than a PERFORM statement?
How did you compare the runtime of CALL METHOD vs. PERFOM? Did you try the program RSHOWTIM / Or have you done some performance tests from the ABAP workbench? A single subroutine call does not differ significantly from a method invocation. However, if called in mass test method invocations have a slightly better performance (in the magnitude of microseconds).
On the whole, I recommend OOD and OOP with the same arguments as the users who posted before. But you have to keep in mind that senior developers familiar with the old ABAP world have to understand OO principles before they start writing ABAP OO.
Otherwise, your organization would not profit by ABAP OO, on the contrary. There are a lot of experienced ABAP developers without OO knowledge who were pushed to write classes. What they do is actually mimicking procedural principles with classes (e.g. a class with static methods exclusively - as a substitute for function modules/subroutines).
Best of luck for your organisation for your challenge with ABAP OO! It is not about the language, it is more about getting OO principles into the mind of your staff.
I don't know about ABAP, but I have seen the same happen with VB developers moving to the .Net platform.
Programmers are comfortable in their old way of programming, and the old way still works. The new way of programming takes a lot of investment, not only from the company but also from people who have to move out of their comfort zone into uncertain territory. If your company is unwilling to invest in training and time for research this problem will get bigger because people will have to invest their own time, not everyone is willing to do that.
As Taurean already showed there are convincing reasons to move to the OO way of doing things. They're mostly not about performance but about better decoupling of components in your system making it far more maintainable.
But in my experience its hard to convince people to move out of their comfort zone using reasonable arguments like that. It usually works better to show them the way. Slowly start using OO constructs in your own code, show people how clean it looks. This isn't something you'll achieve in months, it can take years to get people to think and work differently.
A team of experienced procedural developers is unlikely to start developing in an OO style anytime soon, unless a significant (and expensive) effort is made to train and coach them.
There are numerous reasons for that:
It takes about a year of immersion in a real OO environment (smalltalk, not java or c++) to get any good at OO development.
They cannot start from scratch, there is a lot of legacy code, and time pressure.
All their legacy code is not OO. It takes a significant effort to restructure.
The legacy code is not well structured and has lots of duplication and no unit tests. Changing it takes too much time, so they don't have time to fix things. (It's amazing what you can deduce from a project without knowing anything about it. :) ).
As a consequence, their new code will most likely be procedural but in classes and methods. They will not be impressed by the advantages of OO.
Some good reasons to switch to ABAP OO is:
ABAP OO is more explicit and simpler to use
ABAP OO has a stricter syntax check which removes a lot of the ambiguity in the ABAP language
Much of the new Netweaver functionality is only available using OO
Add this to the benefits listed by Taurean:
Better data encapsulation
Multiple Instansiation
Better Garbage Collection
Code Reuse through inheritance
Manipulate busines objects through standard interfaces
Event Driven programming
Starting to use ABAP OO:
Start by calling some SAP standard OO functionality in your code: Use ALV classes, rather than the Function Modules - the classes provides much more functionality. Try calling some of the standard methods in the CL_ABAP* or CL_GUI_FRONTEND* classes
Write a report as a Singleton using local classes
Try designing a simple class in SE24 for something that is familiar to you (a file-handler for instance)
Resources:
Design Patterns in Object-Oriented ABAP by Igor Barbaric
Not yet using ABAP Objects? Eight Reasons why Every ABAP developer should give it a second look. by Horst Keller and Gerd Kluger
OO or not OO is not a question!!
Question is where OO and where NOT OO .
All advantages of OO approach (OOD and OOP) can be fully exploited as long as you are in customer name space. However every access to SAP standard functionality creates huge headaches.
Transactional integrity, object consistency and synchronisation, DB commits, screens (module pools and selection screens), authority checks, batch-input. These are just some of objects that is difficult (or even impossible) to integrate in OO approach. Integration of SAP standard modules moves this to even higher level of complexity.
User-exits, Events:
Most of data are provides in interface. Access to customer specific data or customisation can be placed in objects.
Reports: Most of data will be read by standard FM. Specific data processing can be placed in objects. Can be easy reused in others reports. SAP enjoy controls can be wrapped with object shell for easy using und reusing. Screens can NOT be palces in objects. :-(((
Core processing: Replacing of SAP business object maintenance or SAP processes is not encouraged by SAP. But if this is a case by patient and ready for huge effort. Lets look closer. There are a lot of technichal challanges: singleton pattern, optimisation of DB access, locking, synchronisation, etc. Separation of technical and business functionality should be adressed. Objets are not really suitable for mass procesing (high DB load) therefore mass processing should be adressed.
Below are some of the advantages of OOP as you must be knowing:
Data Encapsulation
Instantiation
Code Reuse
Interfaces
Taking advantage of these, there are many important reasons for using OO ABAP "whenever possible". Even if you do not want use OO programming, using ABAP Objects is still a good idea since it provides some features that procedural programming does not.
So here's what ABAP Objects offers you over Procedural ABAP:
Better encapsulation
Support for multiple instantiation
Better techniques for reusing code
Better interfaces
An explicit event concept
There are only two purposes for which Procedural ABAP is found essential:
Encapsulation of classic screens in
function modules.
When you want to make functions
available to other systems, but are
not able to make class methods
available externally using XI server
proxies. In this case, you have to
use function modules.
Study about them in detail here and you will see that you don't need any significant operational/demonstrative reason to convince yourself to move to OO ABAP, coz all these reasons are already very significant.
To put it simple, use it when you have a relatively young team who are eager and ready to learn a new programming paradigm. In a senior-dominated team, adoption OO could be challenging. More so because the maintainability of the program goes down. The org may need new employees to maintain the OO code.
From a design perspective, there is no question(as a lot of people have also said in this forum) that its the best and has been in use since ages. SAP is way behind in terms of technology. Their ECC DB design is still in 2-NF. the standard 3-NF is what they've called '3D' database.
No deviating too much from the main topic, I believe you now have too many good answers to reach a decision.

Practices for programming in a scientific environment? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
Background
Last year, I did an internship in a physics research group at a university. In this group, we mostly used LabVIEW to write programs for controlling our setups, doing data acquisition and analyzing our data. For the first two purposes, that works quite OK, but for data analysis, it's a real pain. On top of that, everyone was mostly self-taught, so code that was written was generally quite a mess (no wonder that every PhD quickly decided to rewrite everything from scratch). Version control was unknown, and impossible to set up because of strict software and network regulations from the IT department.
Now, things actually worked out surprisingly OK, but how do people in the natural sciences do their software development?
Questions
Some concrete questions:
What languages/environments have you used for developing scientific software, especially data analysis? What libraries? (for example, what do you use for plotting?)
Was there any training for people without any significant background in programming?
Did you have anything like version control, and bug tracking?
How would you go about trying to create a decent environment for programming, without getting too much in the way of the individual scientists (especially physicists are stubborn people!)
Summary of answers thus far
The answers (or my interpretation of them) thus far: (2008-10-11)
Languages/packages that seem to be the most widely used:
LabVIEW
Python
with SciPy, NumPy, PyLab, etc. (See also Brandon's reply for downloads and links)
C/C++
MATLAB
Version control is used by nearly all respondents; bug tracking and other processes are much less common.
The Software Carpentry course is a good way to teach programming and development techniques to scientists.
How to improve things?
Don't force people to follow strict protocols.
Set up an environment yourself, and show the benefits to others. Help them to start working with version control, bug tracking, etc. themselves.
Reviewing other people's code can help, but be aware that not everyone may appreciate that.
What languages/environments have you used for developing scientific software, esp. data analysis? What libraries? (E.g., what do you use for plotting?)
I used to work for Enthought, the primary corporate sponsor of SciPy. We collaborated with scientists from the companies that contracted Enthought for custom software development. Python/SciPy seemed to be a comfortable environment for scientists. It's much less intimidating to get started with than say C++ or Java if you're a scientist without a software background.
The Enthought Python Distribution comes with all the scientific computing libraries including analysis, plotting, 3D visualation, etc.
Was there any training for people without any significant background in programming?
Enthought does offer SciPy training and the SciPy community is pretty good about answering questions on the mailing lists.
Did you have anything like version control, bug tracking?
Yes, and yes (Subversion and Trac). Since we were working collaboratively with the scientists (and typically remotely from them), version control and bug tracking were essential. It took some coaching to get some scientists to internalize the benefits of version control.
How would you go about trying to create a decent environment for programming, without getting too much in the way of the individual scientists (esp. physicists are stubborn people!)
Make sure they are familiarized with the tool chain. It takes an investment up front, but it will make them feel less inclined to reject it in favor of something more familiar (Excel). When the tools fail them (and they will), make sure they have a place to go for help — mailing lists, user groups, other scientists and software developers in the organization. The more help there is to get them back to doing physics the better.
The course Software Carpentry is aimed specifically at people doing scientific computing and aims to teach the basics and lessons of software engineering, and how best to apply them to projects.
It covers topics like version control, debugging, testing, scripting and various other issues.
I've listened to about 8 or 9 of the lectures and think it is to be highly recommended.
Edit: The MP3s of the lectures are available as well.
Nuclear/particle physics here.
Major programing work used to be done mostly in Fortran using CERNLIB (PAW, MINUIT, ...) and GEANT3, recently it has mostly been done in C++ with ROOT and Geant4. There are a number of other libraries and tools in specialized use, and LabVIEW sees some use here and there.
Data acquisition in my end of this business has often meant fairly low level work. Often in C, sometimes even in assembly, but this is dying out as the hardware gets more capable. On the other hand, many of the boards are now built with FPGAs which need gate twiddling...
One-offs, graphical interfaces, etc. use almost anything (Tcl/Tk used to be big, and I've been seeing more Perl/Tk and Python/Tk lately) including a number of packages that exist mostly inside the particle physics community.
Many people writing code have little or no formal training, and process is transmitted very unevenly by oral tradition, but most of the software group leaders take process seriously and read as much as necessary to make up their deficiencies in this area.
Version control for the main tools is ubiquitous. But many individual programmers neglect it for their smaller tasks. Formal bug tracking tools are less common, as are nightly builds, unit testing, and regression tests.
To improve things:
Get on the good side of the local software leaders
Implement the process you want to use in your own area, and encourage those you let in to use it too.
Wait. Physicists are empirical people. If it helps, they will (eventually!) notice.
One more suggestion for improving things.
Put a little time in to helping anyone you work directly with. Review their code. Tell them about algorithmic complexity/code generation/DRY or whatever basic thing they never learned because some professor threw a Fortran book at them once and said "make it work". Indoctrinate them on process issues. They are smart people, and they will learn if you give them a chance.
This might be slightly tangential, but hopefully relevant.
I used to work for National Instruments, R&D, where I wrote software for NI RF & Communication toolkits. We used LabVIEW quite a bit, and here are the practices we followed:
Source control. NI uses Perforce. We did the regular thing - dev/trunk branches, continuous integration, the works.
We wrote automated test suites.
We had a few people who came in with a background in signal processing and communication. We used to have regular code reviews, and best practices documents to make sure their code was up to the mark.
Despite the code reviews, there were a few occasions when "software guys", like me had to rewrite some of this code for efficiency.
I know exactly what you mean about stubborn people! We had folks who used to think that pointing out a potential performance improvement in their code was a direct personal insult! It goes without saying that that this calls for good management. I thought the best way to deal with these folks is to go slowly, not press to hard for changes and if necessary be prepared to do the dirty work. [Example: write a test suite for their code].
I'm not exactly a 'natural' scientist (I study transportation) but am an academic who writes a lot of my own software for data analysis. I try to write as much as I can in Python, but sometimes I'm forced to use other languages when I'm working on extending or customizing an existing software tool. There is very little programming training in my field. Most folks are either self-taught, or learned their programming skills from classes taken previously or outside the discipline.
I'm a big fan of version control. I used Vault running on my home server for all the code for my dissertation. Right now I'm trying to get the department to set up a Subversion server, but my guess is I will be the only one who uses it, at least at first. I've played around a bit with FogBugs, but unlike version control, I don't think that's nearly as useful for a one-man team.
As for encouraging others to use version control and the like, that's really the problem I'm facing now. I'm planning on forcing my grad students to use it on research projects they're doing for me, and encouraging them to use it for their own research. If I teach a class involving programming, I'll probably force the students to use version control there too (grading them on what's in the repository). As far as my colleagues and their grad students go, all I can really do is make a server available and rely on gentle persuasion and setting a good example. Frankly, at this point I think it's more important to get them doing regular backups than get them on source control (some folks are carrying around the only copy of their research data on USB flash drives).
1.) Scripting languages are popular these days for most things due to better hardware. Perl/Python/Lisp are prevalent for lightweight applications (automation, light computation); I see a lot of Perl at my work (computational EM) since we like Unix/Linux. For performance stuff, C/C++/Fortran are typically used. For parallel computing, well, we usually manually parallelize runs in EM as opposed to having a program implicitly do it (ie split up the jobs by look angle when computing radar cross sections).
2.) We just kind of throw people into the mix here. A lot of the code we have is very messy, but scientists are typically a scatterbrained bunch that don't mind that sort of thing. Not ideal, but we have things to deliver and we're severely understaffed. We're slowly getting better.
3.) We use SVN; however, we do not have bug tracking software. About as good as it gets for us is a txt file that tells you where bugs specific bugs are.
4.) My suggestion for implementing best practices for scientists: do it slowly. As scientists, we typically don't ship products. No one in science makes a name for himself by having clean, maintainable code. They get recognition from the results of that code, typically. They need to see justification for spending time on learning software practices. Slowly introduce new concepts and try to get them to follow; they're scientists, so after their own empirical evidence confirms the usefulness of things like version control, they will begin to use it all the time!
I'd highly recommend reading "What Every Computer Scientist Should Know About Floating-Point Arithmetic". A lot of problems I encounter on a regular basis come from issues with floating point programming.
I am a physicist working in the field of condensed matter physics, building classical and quantum models.
Languages:
C++ -- very versatile: can be used for anything, good speed, but it can be a bit inconvenient when it comes to MPI
Octave -- good for some supplementary calculations, very convenient and productive
Libraries:
Armadillo/Blitz++ -- fast array/matrix/cube abstractions for C++
Eigen/Armadillo -- linear algebra
GSL -- to use with C
LAPACK/BLAS/ATLAS -- extremely big and fast, but less convenient (and written in FORTRAN)
Graphics:
GNUPlot -- it has very clean and neat output, but not that productive sometimes
Origin -- very convenient for plotting
Development tools:
Vim + plugins -- it works great for me
GDB -- a great debugging tool when working with C/C++
Code::Blocks -- I used it for some time and found it quite comfortable, but Vim is still better in my opinion.
I work as a physicist in a UK university.
Perhaps I should emphasise that different areas of research have different emphasis on programming. Particle physicists (like dmckee) do computational modelling almost exclusively and may collaborate on large software projects, whereas people in fields like my own (condensed matter) write code relatively infrequently. I suspect most scientists fall into the latter camp. I would say coding skills are usually seen as useful in physics, but not essential, much like physics/maths skills are seen as useful for programmers but not essential. With this in mind...
What languages/environments have you used for developing scientific software, esp. data analysis? What libraries? (E.g., what do you use for plotting?)
Commonly data analysis and plotting is done using generic data analysis packages such as IGOR Pro, ORIGIN, Kaleidegraph which can be thought of as 'Excel plus'. These packages typically have a scripting language that can be used to automate. More specialist analysis may have a dedicated utility for the job that generally will have been written a long time ago, no-one has the source for and is pretty buggy. Some more techie types might use the languages that have been mentioned (Python, R, MatLab with Gnuplot for plotting).
Control software is commonly done in LabVIEW, although we actually use Delphi which is somewhat unusual.
Was there any training for people without any significant background in programming?
I've been to seminars on grid computing, 3D visualisation, learning Boost etc. given by both universities I've been at. As an undergraduate we were taught VBA for Excel and MatLab but C/MatLab/LabVIEW is more common.
Did you have anything like version control, bug tracking?
No, although people do have personal development setups. Our code base is in a shared folder on a 'server' which is kept current with a synching tool.
How would you go about trying to create a decent environment for programming, without getting too much in the way of the individual scientists (esp. physicists are stubborn people!)
One step at a time! I am trying to replace the shared folder with something a bit more solid, perhaps finding a SVN client which mimics the current synching tools behaviour would help.
I'd say though on the whole, for most natural science projects, time is generally better spent doing research!
Ex-academic physicist and now industrial physicist UK here:
What languages/environments have you used for developing scientific software, esp. data analysis? What libraries? (E.g., what do you use for plotting?)
I mainly use MATLAB these days (easy to access visualisation functions and maths). I used to use Fortran a lot and IDL. I have used C (but I'm more a reader than a writer of C), Excel macros (ugly and confusing). I'm currently needing to be able to read Java and C++ (but I can't really program in them) and I've hacked Python as well. For my own entertainment I'm now doing some programming in C# (mainly to get portability / low cost / pretty interfaces). I can write Fortran with pretty much any language I'm presented with ;-)
Was there any training for people without any significant background in programming?
Most (all?) undergraduate physics course will have a small programming course usually on C, Fortran or MATLAB but it's the real basics. I'd really like to have had some training in software engineering at some point (revision control / testing / designing medium scale systems)
Did you have anything like version control, bug tracking?
I started using Subversion / TortoiseSVN relatively recently. Groups I've worked with in the past have used revision control. I don't know any academic group which uses formal bug tracking software. I still don't use any sort of systematic testing.
How would you go about trying to create a decent environment for programming, without getting too much in the way of the individual scientists (esp. physicists are stubborn people!)
I would try to introduce some software engineering ideas at undergraduate level and then reinforce them by practice at graduate level, also provide pointers to resources like the Software Carpentry course mentioned above.
I'd expect that a significant fraction of academic physicists will be writing software (not necessarily all though) and they are in dire need of at least an introduction to ideas in software engineering.
What languages/environments have you used for developing scientific software, esp. data analysis? What libraries? (E.g., what do you use for plotting?)
Python, NumPy and pylab (plotting).
Was there any training for people without any significant background in programming?
No, but I was working in a multimedia research lab, so almost everybody had a computer science background.
Did you have anything like version control, bug tracking?
Yes, Subversion for version control, Trac for bug tracing and wiki. You can get free bug tracker/version control hosting from http://www.assembla.com/ if their TOS fits your project.
How would you go about trying to create a decent environment for programming, without getting too much in the way of the individual scientists (esp. physicists are stubborn people!).
Make sure the infrastructure is set up and well maintained and try to sell the benefits of source control.
I'm a statistician at a university in the UK. Generally people here use R for data analysis, it's fairly easy to learn if you know C/Perl. Its real power is in the way you can import and modify data interactively. It's very easy to take a number of say CSV (or Excel) files and merge them, create new columns based on others and then throw that into a GLM, GAM or some other model. Plotting is trivial too and doesn't require knowledge of a whole new language (like PGPLOT or GNUPLOT.) Of course, you also have the advantage of having a bunch of built-in features (from simple things like mean, standard deviation etc all the way to neural networks, splines and GL plotting.)
Having said this, there are a couple of issues. With very large datasets R can become very slow (I've only really seen this with >50,000x30 datasets) and since it's interpreted you don't get the advantage of Fortran/C in this respect. But, you can (very easily) get R to call C and Fortran shared libraries (either from something like netlib or ones you've written yourself.) So, a usual workflow would be to:
Work out what to do.
Prototype the code in R.
Run some preliminary analyses.
Re-write the slow code into C or Fortran and call that from R.
Which works very well for me.
I'm one of the only people in my department (of >100 people) using version control (in my case using git with githuib.com.) This is rather worrying, but they just don't seem to be keen on trying it out and are content with passing zip files around (yuck.)
My suggestion would be to continue using LabView for the acquisition (and perhaps trying to get your co-workers to agree on a toolset for acquisition and making is available for all) and then move to exporting the data into a CSV (or similar) and doing the analysis in R. There's really very little point in re-inventing the wheel in this respect.
What languages/environments have you used for developing scientific software, esp. data analysis? What libraries? (E.g., what do you use for plotting?)
My undergraduate physics department taught LabVIEW classes and used it extensively in its research projects.
The other alternative is MATLAB, in which I have no experience. There are camps for either product; each has its own advantages/disadvantages. Depending on what kind of problems you need to solve, one package may be more preferable than the other.
Regarding data analysis, you can use whatever kind of number cruncher you want. Ideally, you can do the hard calculations in language X and format the output to plot nicely in Excel, Mathcad, Mathematica, or whatever the flavor du jour plotting system is. Don't expect standardization here.
Did you have anything like version control, bug tracking?
Looking back, we didn't, and it would have been easier for us all if we did. Nothing like breaking everything and struggling for hours to fix it!
Definitely use source control for any common code. Encourage individuals to write their code in a manner that could be made more generic. This is really just coding best practices. Really, you should have them teaching (or taking) a computer science class so they can get the basics.
How would you go about trying to create a decent environment for programming, without getting too much in the way of the individual scientists (esp. physicists are stubborn people!)
There is a clear split between data aquisition (DAQ) and data analysis. Meaning, it's possible to standardize on the DAQ and then allow the scientists to play with the data in the program of their choice.
Another good option is Scilab. It has graphic modules à la LabVIEW, it has its own programming language and you can also embed Fortran and C code, for example. It's being used in public and private sectors, including big industrial companies. And it's free.
About versioning, some prefer Mercurial, as it gives more liberties managing and defining the repositories. I have no experience with it, however.
For plotting I use Matplotlib. I will soon have to make animations, and I've seen good results using MEncoder. Here is an example including an audio track.
Finally, I suggest going modular, this is, trying to keep main pieces of code in different files, so code revision, understanding, maintenance and improvement will be easier. I have written, for example, a Python module for file integrity testing, another for image processing sequences, etc.
You should also consider developing with the use a debugger that allows you to check variable contents at settable breakpoints in the code, instead using print lines.
I have used Eclipse for Python and Fortran developing (although I got a false bug compiling a Fortran short program with it, but it may have been a bad configuration) and I'm starting to use the Eric IDE for Python. It allows you to debug, manage versioning with SVN, it has an embedded console, it can do refactoring with Bicycle Repair Man (it can use another one, too), you have Unittest, etc. A lighter alternative for Python is IDLE, included with Python since version 2.3.
As a few hints, I also suggest:
Not using single-character variables. When you want to search appearances, you will get results everywhere. Some argue that a decent IDE makes this easier, but then you will depend on having permanent access to the IDE. Even using ii, jj and kk can be enough, although this choice will depend on your language. (Double vowels would be less useful if code comments are made in Estonian, for instance).
Commenting the code from the very beginning.
For critical applications sometimes it's better to rely on older language/compiler versions (major releases), more stable and better debugged.
Of course you can have more optimized code in later versions, fixed bugs, etc, but I'm talking about using Fortran 95 instead of 2003, Python 2.5.4 instead of 3.0, or so. (Specially when a new version breaks backwards compatibility.) Lots of improvements usually introduce lots of bugs. Still, this will depend on specific application cases!
Note that this is a personal choice, many people could argue against this.
Use redundant and automated backup! (With versioning control).
Definitely, use Subversion to keep current, work-in-progress, and stable snapshot copies of source code. This includes C++, Java etc. for homegrown software tools, and quickie scripts for one-off processing.
With the strong leaning in science and applied engineering toward "lone cowboy" development methodology, the usual practice of organizing the repository into trunk, tag and whatever else it was - don't bother! Scientists and their lab technicians like to twirl knobs, wiggle electrodes and chase vacuum leaks. It's enough of a job to get everyone to agree to, say Python/NumPy or follow some naming convention; forget trying to make them follow arcane software developer practices and conventions.
For source code management, centralized systems such as Subversion are superior for scientific use due to the clear single point of truth (SPOT). Logging of changes and ability to recall versions of any file, without having chase down where to find something, has huge record-keeping advantages. Tools like Git and Monotone: oh my gosh the chaos I can imagine that would follow! Having clear-cut records of just what version of hack-job scripts were used while toying with the new sensor when that Higgs boson went by or that supernova blew up, will lead to happiness.
What languages/environments have you
used for developing scientific
software, esp. data analysis? What
libraries? (E.g., what do you use for
plotting?)
Languages I have used for numerics and sicentific-related stuff:
C (slow development, too much debugging, almost impossible to write reusable code)
C++ (and I learned to hate it -- development isn't as slow as C, but can be a pain. Templates and classes were cool initially, but after a while I realized that I was fighting them all the time and finding workarounds for language design problems
Common Lisp, which was OK, but not widely used fo Sci computing. Not easy to integrate with C (if compared to other languages), but works
Scheme. This one became my personal choice.
My editor is Emacs, although I do use vim for quick stuff like editing configuration files.
For plotting, I usually generate a text file and feed it into gnuplot.
For data analysis, I usually generate a text file and use GNU R.
I see lots of people here using FORTRAN (mostly 77, but some 90), lots of Java and some Python. I don't like those, so I don't use them.
Was there any training for people
without any significant background in
programming?
I think this doesn't apply to me, since I graduated in CS -- but where I work there is no formal training, but people (Engineers, Physicists, Mathematicians) do help each other.
Did you have anything like version
control, bug tracking?
Version control is absolutely important! I keep my code and data in three different machines, in two different sides of the world -- in Git repositories. I sync them all the time (so I have version control and backups!) I don't do bug control, although I may start doing that.
But my colleagues don't BTS or VCS at all.
How would you go about trying to
create a decent environment for
programming, without getting too much
in the way of the individual
scientists (esp. physicists are
stubborn people!)
First, I'd give them as much freedom as possible. (In the University where I work I could chooe between having someone install Ubuntu or Windows, or install my own OS -- I chose to install my own. I don't have support from them and I'm responsible for anything that happens with my machins, including security issues, but I do whatever I want with the machine).
Second, I'd see what they are used to, and make it work (need FORTRAN? We'll set it up. Need C++? No problem. Mathematica? OK, we'll buy a license). Then see how many of them would like to learn "additional tools" to help them be more productive (don't say "different" tools. Say "additional", so it won't seem like anyone will "lose" or "let go" or whatever). Start with editors, see if there are groups who would like to use VCS to sync their work (hey, you can stay home and send your code through SVN or GIT -- wouldn't that be great?) and so on.
Don't impose -- show examples of how cool these tools are. Make data analysis using R, and show them how easy it was. Show nice graphics, and explain how you've created them (but start with simple examples, so you can quickly explain them).
I would suggest F# as a potential candidate for performing science-related manipulations given its strong semantic ties to mathematical constructs.
Also, its support for units-of-measure, as written about here makes a lot of sense for ensuring proper translation between mathematical model and implementation source code.
First of all, I would definitely go with a scripting language to avoid having to explain a lot of extra things (for example manual memory management is - mostly - ok if you are writing low-level, performance sensitive stuff, but for somebody who just wants to use a computer as an upgraded scientific calculator it's definitely overkill). Also, look around if there is something specific for your domain (as is R for statistics). This has the advantage of already working with the concepts the users are familiar with and having specialized code for specific situations (for example calculating standard deviations, applying statistical tests, etc in the case of R).
If you wish to use a more generic scripting language, I would go with Python. Two things it has going for it are:
The interactive shell where you can experiment
Its clear (although sometimes lengthy) syntax
As an added advantage, it has libraries for most of the things you would want to do with it.
I'm no expert in this area, but I've always understood that this is what MATLAB was created for. There is a way to integrate MATLAB with SVN for source control as well.

Which scripting language to support in an existing codebase?

I'm looking at adding scripting functionality to an existing codebase and am weighing up the pros/cons of various packages. Lua is probably the most obvious choice, but I was wondering if people have any other suggestions based on their experience.
Scripts will be triggered upon certain events and may stay resident for a period of time. For example upon startup a script may define several options which the program presents to the user as a number of buttons. Upon selecting one of these buttons the program will notify the script where further events may occur.
These are the only real requirements;
Must be a cross-platform library that is compilable from source
Scripts must be able to call registered code-side functions
Code must be able to call script-side functions
Be used within a C/C++ codebase.
Based on my own experience:
Python. IMHO this is a good choice. We have a pretty big code base with a lot of users and they like it a lot.
Ruby. There are some really nice apps such as Google Sketchup that use this. I wrote a Sketchup plugin and thought it was pretty nice.
Tcl. This is the old-school embeddable scripting language of choice, but it doesn't have a lot of momentum these days. It's high quality though, they use it on the Hubble Space Telescope!
Lua. I've only done baby stuff with it but IIRC it only has a floating point numeric type, so make sure that's not a problem for the data you will be working with.
We're lucky to be living in the golden age of scripting, so it's hard to make a bad choice if you choose from any of the popular ones.
I have played around a little bit with Spidermonkey. It seems like it would at least be worth a look at in your situation. I have heard good things about Lua as well. The big argument for using a javascript scripting language is that a lot of developers know it already and would probably be more comfortable from the get go, whereas Lua most likely would have a bit of a learning curve.
I'm not completely positive but I think that spidermonkey your 4 requirements.
I've used Python extensively for this purpose and have never regretted it.
Lua is has the most straight-forward C API for binding into a code base that I've ever used. In fact, I usually quickly roll bindings for it by hand. Whereas, you often wouldn't consider doing so without a generator like swig for others. Also, it's typically faster and more light weight than the alternatives, and coroutines are a very useful feature that few other languages provide.
AngelScript
lets you call standard C functions and C++ methods with no need for proxy functions. The application simply registers the functions, objects, and methods that the scripts should be able to work with and nothing more has to be done with your code. The same functions used by the application internally can also be used by the scripting engine, which eliminates the need to duplicate functionality.
For the script writer the scripting language follows the widely known syntax of C/C++ (with minor changes), but without the need to worry about pointers and memory leaks.
The original question described Tcl to a "T".
Tcl was designed from the beginning to be an embedded scripting language. It has evolved to be a first class dynamic language in its own right but still is used all over the world as an embeded language. It is available under the BSD license so it is just about as free as it gets. It also compiles on pretty much any moden platform, and many not-so-modern. And not only does it work on desktop systems, there are variations available for mobile platforms.
Tcl excels as a "glue" language, where you can write performance-intensive functions in C while still benefiting from the advantages of a scripting language for less performance critical parts of the application.
Tcl also comes with a first class GUI toolkit (Tk) that is arguably one of the easiest cross platform GUI toolkits available. It also interfaces very nicely with SQLite and other databases, and has had built-in support for unicode for quite some time.
If the scripting interface will be made available to your customers (as opposed to simply enabling your own engineers to work at the scripting level), Tcl is extremely easy to learn as there are a total of only 12 rules that govern the entire language (as of tcl 8.6). In fact, Tcl shines as a way to invent domain specific languages which is often how it is used as an end-user scripting solution.
There were some excellent suggestions already, but I just wanted to mention that Perl can also be called / can call to C/C++.
You probably could use any modern scripting / bytecode language.
If you're willing to put up with the growing pains of a new product, you could use the Parrot VM. Which has support for many, if not all of the languages listed on this page. Unfortunately it's not done yet, but that hasn't stopped some people from using it in a production environment.
I think most people are probably mentioning the scripting language that they are most familiar with. From my perspective, Tcl was designed specifically to interface with C, so your problem domain is tailor-made for the language. However, I'm sure Python, Perl, or Lua would be fine. You should probably choose the language that is most familiar to your current team, since that will reduce the learning time.