How hard is it for a software developer to learn how to program a microcontroller? - embedded

I'm a software developer. I've been programming in high level languages for a few years.
I would like to know, how to take my first step into programming hardware. Not something crazy complicated, but maybe some ordinary CE device? Assuming I don't need to put the PCB together with varies components, but just to program the tiny cpu?
How low-level do I have to go? ASM? C? manipulating registers? or are the dev kit quite high level now? Is Java even in the picture? OO coding in hardware, is that even a dream or a reality? Need a reality check.
I also tend to learn better with books or sites that are written in a tutorial format. Something that guides the way for me from something simple to something more complex. Any recommendations? Maybe something that will introduce me to the popular hardware (microprocessor/micro-controller) available today?
Much appreciated, thank you everyone.

The actual programming isn't a big deal. The frustrating, annoying part is getting your development environment setup and getting the tools working. Once you've done that, you're half done.
I'd suggest buying a development kit ('dev kit') that has USB built in and works with your chosen OS. Get that working, and you're halfway done.
If you're missing the knowledge, it's also important to know the basics of how a processor works. You'll be programming at a much lower level than any other programming, so the fundamentals are a bit more important.

If you know C then it's only a matter of learnig the tool chain steps to download the code.
Easy place to start (cheap hardware/software) http://www.arduino.cc/en/Guide/HomePage

I have been coding in C both as a hobby and professionally for about 16 years now, but always for userland code (i.e., programs, not kernel or drivers). Most of my jobs involved high level languages (I have done a lot of Perl and Ruby programming, with the occasional Java, Python and shell scripting in between). I did develop a lot for MS-DOS (which was probably as close to bare-metal programming as you would get on a x86 machine), but my last job involved 5 years of Perl and Ruby on Rails web development.
That being said, I am now a senior engineer for embedded Linux development, developing drivers (including an emulator for a legacy simple microprocessor inside a kernel module) for uClinux on the Blackfin platform. There are times when my inexperience with hardware related issues (i.e., floating signal levels due to lack of a pull-up/pull-down on a pin) did get in the way, but it has been mostly a highly enjoyable and thrilling experience. As stated by others, understanding your tools is essential -- for uClinux, that meant the GNU Toolchain, which fortunately I was already familiar with due to my background on FOSS technologies.
The Blackfin is hardly an entry-level microprocessor (in particular, it does not have a MMU, which has some relevant effects on Linux development), but as already stated, you can buy a Beagleboard for around US$200 with all required accessories and start messing around with it in just a few days. If you want something simpler, there are many Arduino options out there, though if you have some real development experience under your belt I believe you will find their development environment a little limiting (I know I did).
After you get comfortable with your tools you might want to spend some money on an in-circuit emulator (or ICE). These are usually highly platform specific (both in terms of target architecture and development environment), but are highly recommended for anything beyond the usual blink-LEDs-after-button-press examples I am sure you will quickly outgrow.
In few months you will find yourself building custom images for hackable customer devices using Buildroot and having a lot of fun. All I can say is, go for it, it's highly addictive and not particularly expensive to do nowadays.

Also something to look into is the Microsoft Robotics Studio. They support quite a lot of hardware boards (including CE), and with it is is fairly easy to get a small robot up and running. And what's more cool a project to learn embedded programming?
It all integrates nicely in Visual Studio (express) and their devkit also comes with a free express edition.

Get a beagleboard. Cheap, lots of users (community support will be key), many OS options. http://beagleboard.org/

Well, if you want to know what you're doing, you need to understand the assembly language of the processor and the processor's architecture.
You will need to learn C to be competent in microcontrollers. There is no way around that.
There are some VM-level languages on embedded systems. I see the Java out-of-memory exception from time to time on my cell phone(which also helps to give me a strong opinion on VM-level embedded languages).
The ARM has some support for hardware-level Java bytecodes.
Your best bet is to pick up something like the PIC or the Atmel chips and begin hacking with them.
If you want to do it with your existing hardware, get a hypervisor for your PC and begin writing a basic kernel.

Related

Would like some pointers for Xojo

My company is planning to build a simulation tool for processing (beverage) and we're currently looking at a half-baked system written in Xojo. I had personally never heard of this language and would appreciate it if anyone could give a quick assessment.
We have no in-house Xojo competence at all and are of course reluctant to bring in a system that would require a big investment in know-how for just one system.
So, we're now looking at our options: Port it to a language we're good at (C# or Java) or continue development in Xojo while building internal skills for the language.
So, what are the big pro's and con's with Xojo?
Cheers
Xojo has been around since the late 1990s, then named RealBasic. Its strength lies in its ability to make native looking and behaving apps for many platforms, mainly OS X but also Win and even Linux. The dev community is fairly small, though. But the company managed to stay in business all this time and isn't looking to end it any time soon.
The language is fairly simple and easy to learn, using long known concepts (its design was based on Visual Basic).
Knowing Java, it should be easy to grasp the language. The bigger hurdle is probably getting familiar with its libary. Many things are much simpler to accomplish in Xojo vs Java, though.
Call me lazy, but that's what I like about Xojo. I also program ObjC in Xcode, but for those little tools that just need to work quickly, Xojo is superior for whipping out a program quickly that that has a decent UI and works on many platforms with little to no tweaks.
If you need x-platform support, give it a try, for sure. If you only need the app to run on a single platform, and if you have skills with other dev systems, I'd advise against starting out with Xojo, to avoid the risks you get when going with such a small company that's offering closed-source software.
In your particular case where you have already a half-working solution, I suggest you take a few days to familiarize yourself to get a feeling for it (you can use Xojo for free as long as you don't build standalone apps with it). It's overall fairly stable and I'm still using a 3-year old version most of the time to develop and build my apps. So, even if Xojo should go out of business suddenly, I'd not be too worried. As long as you stick with the simple functionality (e.g. not use unique features such as XojoScript), you can still convert the app to another language later, but there's also a fair chance you never have to.
If you are looking for someone to take a look at your Xojo project I'd recommend posting on the Xojo Find A Developer page at http://www.xojo.com/support/consultants.php where all Pro developers get it. The consultants that want to talk to you about it will then contact you. (Full disclosure: we, BKeeney Software, are on the list and would be happy to help you figure it all out).

What are alternatives to the Java VM?

As Oracle sues Google over the Dalvik VM it becomes clear, that you cannot implement a Java VM without license from Oracle (EDIT: Matthew Flaschen points out, that the claims of Oracle may not be valid. Anyways we have currently a situation, where Oracle threats VM-implementations.). That may become the death for Open-Source-implementations of Java (like Apache Harmony).
I don't want to discuss the impact or the legitimation of this lawsuit. but as a Java-programmer I want to take a deeper look into the alternatives, to be prepared for every case. As I see the creation of a compiler as a minor problem, my main interest are alternative VM-implementations, that serve a similar purpose as the JVM.
The VM I'm looking for, should meet some conditions:
free of patent-issues
an Open-Source-implementation exists
potential for optimizations/good performance
platform independent (the VM can be ported to different platforms without bigger hurdles)
Please add some recommendations for me.
LLVM is a really good optimizing, low level virtual machine. It can support languages like C and C++, and does not have built in support for high level features like garbage collection.
VMKit is an implementation of the Java and CLI virtual machines on top of LLVM. Since it uses Java bytecode, this probably wouldn't help with the patent issues.
HLVM is another interesting high level virtual machine built on top of LLVM. It is probably different enough to avoid most well known patents, but it is mainly targeted at numerical computing and functional programming.
On the dynamically typed side, there is Parrot.
I am actually working on a compiler and VM for a language of my own design, but don't count on it ever being finished. ;-)
Keep in mind that any large piece of software will infringe on numerous patents, the important thing is how well known they are (and how much the patents' owners actively seek out infringers). Of course, the whole patent system is absurd, and we would be much better off getting rid of it.
I don't think there is any significant piece of software that is free from patent issues.
If you are an independent developer or working for a smaller company you probably won't get hit directly by the problems though. It's unlikely that big companies holding patents will go after lots of small claims - it's an expensive process and causes a lot of resentment. SCO tried something like that and it didn't work out too well for them.
I would concentrate on finding the best tool for the job without worrying too much about the patent issues, otherwise you will never get anything done.
GraalVM is a research project developed by Oracle Labs and already in production at Twitter. I can't believe my eyes that no one mentions anything about it, it’s so weird. Anyways, GraalVM is a well promising extension of the java virtual machine to support more language and execution modes for running applications like JavaScript, Python, Ruby, R, JVM-based languages, and LLVM-based languages such as C and C++.The GraalVM project includes a new high-performance Java compiler, itself called Graal, which can be used in a just-in-time configuration on the HotSpot VM, or in an ahead-of-time configuration on the SubstrateVM. The main goal of this project is to improve the performance of the java virtual machine base language to match the performance of native languages. Let’s sum up the novel features that this project offers and make a brief explanation according to the docs why you should adopt it.
Polyglot: All languages (even LLVM-based) share the same VM and its capabilities. Zero overhead interoperability between programming languages allows you to write polyglot applications and select the best language for your task
Native: Native images compiled with GraalVM ahead-of-time improve the startup time and reduce the memory footprint of JVM-based applications.
Embeddable: GraalVM can be embedded in both managed and native applications. There are existing integrations into OpenJDK, Node.js, Oracle Database, and MySQL GraalVM removes the isolation between programming languages and enables interoperability in a shared runtime. It can run either standalone or in the context of OpenJDK, Node.js, Oracle Database, or MySQL.
Performance: Graal benchmark reports show great performance improvements in almost all of its implementations thanks to the way that GraalVM performs object allocations
If someone don’t get convinced by now that is a good choice and it is a really awesome project you can see this talk by Christian Thalinger on “on why Graal is a good fit for Twitter”

Reusable knowledge going from Embedded to Desktop

I'm thinking about switching my path "slightly" by going into desktop development (VC++, MFC, C#, etc) after about 8 years within embedded telecom systems development (C, MAKE, Symbian, 100 compilers etc, etc).
My concern however is that my experience within embedded systems maybe doesn't give me much value when going into desktop development. For example that the domain specific problems and environments I've worked with for so long still doesn't give me much to negotiate salaries with since it bares little worth on the desktop.
I think this place might be good for input on this.
So, the Q:
If you disregard the obvious generic experience on programming language level, give an example of something you have learned working with embedded systems that you could reuse when working in a desktop environment.
PS:
I should note that I'm no beginner in the desktop area - since many years back all my hobby projects are focused around desktop development.
Embedded engineers in general tend to be more disciplined when it comes to validating operations and dealing with finite resources.
This can also translate into coming up with an exception handling strategy earlier on.
The quintessential example is checking the return value of malloc. I have seen very few desktop software consistently check it, but it's commonplace in embedded environments.
Discipline of having a clean, well-organized set of source-code is the key skill that translates well to the "desktop experience". -- I've noticed that the embedded projects I've written and picked up are often WAY cleaner than their desktop counterparts.
Many desktop-only developers could benefit from the experience of making a program fit in 128K of FLASH and 32K of SRAM, not to mention communicating meaningfully with a user through only an LED or two and a couple of buttons. Making that a requirement might reduce some of the endemic code bloat in the applications industry. :-)
Even if you don't switch tracks to straight application development, the embedded experience translates well to driver development, as well as to low level utilities and to long running services. All of these are also domains where the disciplines that are nearly second-nature to a successful embedded developer remain valuable.
I was a desktop developer for almost 5yrs before switching to an embedded environment.
I find working on an embedded environment more challenging as we have to deal with memory limitations, slow CPU speed, cross-compilation issues, etc.
Having learned a lot of patience, discipline and low-level intricacies, desktop development should be as easy as a walk in the park.
State machines/event driven programming on embedded systems is not that different from event driven programming on the desktop. The depth of experience you have of these coding techniques on embedded systems, especially telecoms embedded systems, should make you a great desktop programmer.
Similarly, your experience with communications protocols should transfer nicely to the desktop. Most desktop applications have some involvement with the network.

Embedded platform development in (!C)

I'm curious to see how popular the alternatives to C are in the embedded developer world e.g. Ada...
I've only ever used C (with a little bit of assembler), but then my targets have very limited resources. Is there a move else where in this space to something else? What is winning the ware in set top boxes?
If !C what was the underlying reason?
Compiler support for target
Trace \ static analysis tools
other...
Thanks.
Forth is quite popular for embedded development.
Also, while Smalltalk is probably not popular in the embedded community, embedded development is definitely popular in the Smalltalk community.
When you say "embedded development", keep in mind that you have to consider the scale of the project.
When programming something on the scale of a microcontroller or the firmware for an ASIC, you tend to see C and assembly dominate the scene. Embedded developers tend to "specialize" in these languages since compilers for them are available for nearly every embedded target platform. If your project migrates from, say, a chip with a PowerPC core to a chip with an ARM core, you can be fairly confident that your C code will not be overly difficult to port over. Some chips do have compilers available for other languages, but typically they do not match the C compiler in terms of efficiency of the resulting binary. Since embedded systems are often low on resources, system designers want to make their code as efficient as possible (also one reason why you see a lot of assembly language code). I have seen development tools available for languages such as C++, Pascal, Basic, and others, but they are typically niche tools that are not mature enough to match the efficiency of the available C compilers. Debugging tools for these languages also tend to be harder to find than what is available for C/assembly.
You also mentioned set-top boxes. Embedded systems on this scale can pack the equivalent power of a desktop computer from 7-8 years ago. Their available RAM, storage space, and processing power allows them to run full-featured operating systems and interpreters for higher-level languages. On these more powerful systems you will still see C and assembly language being used (for driver code, if nothing else), but other languages (such as Java, Lua, Tcl, Ruby, etc) are becoming more and more common. Using interpreted languages makes porting code from one platform to another even easier, as long as the platform has sufficient resources to handle the overhead of the language interpreter. Any low-level code that interfaces directly with hardware (drivers) with still typically use assembly or C since high-level languages don't always have the capability to do this sort of thing. Anything running as an application on top of the embedded operating system can usually be developed and tested inside an emulator or virtual machine, and so you will see a lot of code being developed in whatever language the developer happens to be comfortable with.
TLDR version: C is popular because is it a versatile language that nearly all developers are familiar with. Assembly is popular because it allows for low-level hardware access in ways that would otherwise be difficult or impossible. Interpreted/scripted languages such as Java are becoming more popular, but the resource requirements of the interpreters for these languages may be too much for some embedded systems to handle. The quality and variety of development/debugging tools availability for the C and assembly languages also makes these options attractive.
Perhaps not quite the large step from C you're looking for but C++ is also resonably popular for embedded projects.
I haven't used myself, but Bascom is quite popular for AVR microcontrollers. It is a Basic IDE that lets you interact with the peripherals very easily. I've met hardware people that successfully use it.
Yes. Java is becoming more popular - many processors have added instructions that pertain primarily to Java and similar languages (.net). Also, uclinux runs on microcontrollers, so you can use practically any language for some of the larger micros.
Basic is still common, as is assembly.
You'll see Ada in certain gov't projects.
And some engineers are even putting Lua and other interpreters on their micros so their customers can extend the functionality.
But C is still dominant.
-Adam
In the early 90 I did a lot of embedded development on the 8051 using Intel PLM51 and the DCX51 operating system.
PLM is very simple language – but very powerful
We now use C
If you work in the smartcard space, you get to use Java Card. Yep, Java, on an 8-bit micro. It's kinda fun, actually. I get to develop in Eclipse, test ( & debug!) on the PC simulator, and can be confident that it'll run the same on the card. It's just such a pity Java is a terrible language for embedded apps :)
I've used EC++ (Embedded C++) quite extensively.
Also, PICBasic has been popular with the PIC'ers for eons now.
I have used Ada in embedded project for military avionics because of customer requirements. There is lots of Ada tools for embedded development but most of it is very expensive. Personally I would just use C.
There is a Pascal compiler for 8051
JAL
There is a group of folks working to make Lua a viable option for embedded work. They are targeting primarily 32-bit ARMs with 256K FLASH and 64K RAM or better, and seem happy with their work so far.
They are partly inspired by the classic BASIC-Stamp, a BASIC interpreter running in a moderately powerful PIC with the program itself stored in a serial EEPROM device.
At work, I am still maintaining a customer's embedded system that is written in a compiled flavor of BASIC running in a Zilog Z180 CPU. 1980's technology all around, with most of the system still built out of 24-bin DIP packages in sockets. The compiler runs under CP/M-80 running in a Z80 simulator, that itself runs in the MS-DOS simulator built into Windows. Aside from the shear amazement that anything productive can be done this way (and that you can still buy 27C256 UV erasable EPROMS, and that my nearly 20 year old Data/IO PROM programmer still works) I really wish the customer could afford to move to a new hardware design so the system could be rewritten in a maintainable language.
Depends on the microcontroller, many of them have C but the compilers are horribly, assembler is usually easy and the best performing, most efficient, etc. Ones like the msp, avr, and arm are good for C compilers and for those I would and do use C (depending on the problem).
I would stick to C or assembler, you are wasting memory, performance, and resources using anything else.
Pascal, Modula2 work fine too. Essentially they are pretty much equivalent to C, except for the inability to do alloca (though some have that as extension).
But the core problem will be the problem with any !C compiler: what do you prefer, a better compiler/toolchain or the language of preference.
Despite I like the Wirthian languages most, I simply use C, and am living with the consequences, simply because the toolchain is better.
There have been examples in the past (Pascals, or even tightly compiled Basics), but C is mostly the norm. I never understood why.
I worked on a device which ran some incredibly old version of python (1.4 or something). There was no way to debug it (other than printing debug messages) so when your code hit an exception everything would just stop and you scratched your head for an hour. Whenever you made a change and upgraded the code it was running, it took about 10 minutes to interpret and compile it.
Needless to say we scrapped that and replaced the microcontroller with one that ran C.
See this related question:
What languages are used for real-time systems programming.
In response to your "why" question, from the standpoint of government/military acquisition, there is a perception that Java (language, platform, etc...) is the lingua franca these days and that economies of scale in the language will reduce acquisition and maintenance cost. There's also a hope that one can efficiently train a competent Java programmer to be a reasonable RT/embedded programmer in Java faster than if they are required to learn a new language. This rationale is suspect, in my opinion, but it does answer the "why" question.
If you include the iPhone as an embedded platform then Objective-C
Considering how many times I've had a Java out-of-memory exception on my phone(most of the time I do anything remotely interesting), I'd run away from Java like a bat out of a hot place.
I've heard that Erlang was designed for use for cell phones. I think Lisp is a good architecture for remote device support- if the device cna handle the run-time.
A lot of home-brew users and small companies needing a cheap solution have found Tiny Tiger and Basic STAMP (using BASIC) meets their needs.

Practices for programming in a scientific environment? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
Background
Last year, I did an internship in a physics research group at a university. In this group, we mostly used LabVIEW to write programs for controlling our setups, doing data acquisition and analyzing our data. For the first two purposes, that works quite OK, but for data analysis, it's a real pain. On top of that, everyone was mostly self-taught, so code that was written was generally quite a mess (no wonder that every PhD quickly decided to rewrite everything from scratch). Version control was unknown, and impossible to set up because of strict software and network regulations from the IT department.
Now, things actually worked out surprisingly OK, but how do people in the natural sciences do their software development?
Questions
Some concrete questions:
What languages/environments have you used for developing scientific software, especially data analysis? What libraries? (for example, what do you use for plotting?)
Was there any training for people without any significant background in programming?
Did you have anything like version control, and bug tracking?
How would you go about trying to create a decent environment for programming, without getting too much in the way of the individual scientists (especially physicists are stubborn people!)
Summary of answers thus far
The answers (or my interpretation of them) thus far: (2008-10-11)
Languages/packages that seem to be the most widely used:
LabVIEW
Python
with SciPy, NumPy, PyLab, etc. (See also Brandon's reply for downloads and links)
C/C++
MATLAB
Version control is used by nearly all respondents; bug tracking and other processes are much less common.
The Software Carpentry course is a good way to teach programming and development techniques to scientists.
How to improve things?
Don't force people to follow strict protocols.
Set up an environment yourself, and show the benefits to others. Help them to start working with version control, bug tracking, etc. themselves.
Reviewing other people's code can help, but be aware that not everyone may appreciate that.
What languages/environments have you used for developing scientific software, esp. data analysis? What libraries? (E.g., what do you use for plotting?)
I used to work for Enthought, the primary corporate sponsor of SciPy. We collaborated with scientists from the companies that contracted Enthought for custom software development. Python/SciPy seemed to be a comfortable environment for scientists. It's much less intimidating to get started with than say C++ or Java if you're a scientist without a software background.
The Enthought Python Distribution comes with all the scientific computing libraries including analysis, plotting, 3D visualation, etc.
Was there any training for people without any significant background in programming?
Enthought does offer SciPy training and the SciPy community is pretty good about answering questions on the mailing lists.
Did you have anything like version control, bug tracking?
Yes, and yes (Subversion and Trac). Since we were working collaboratively with the scientists (and typically remotely from them), version control and bug tracking were essential. It took some coaching to get some scientists to internalize the benefits of version control.
How would you go about trying to create a decent environment for programming, without getting too much in the way of the individual scientists (esp. physicists are stubborn people!)
Make sure they are familiarized with the tool chain. It takes an investment up front, but it will make them feel less inclined to reject it in favor of something more familiar (Excel). When the tools fail them (and they will), make sure they have a place to go for help — mailing lists, user groups, other scientists and software developers in the organization. The more help there is to get them back to doing physics the better.
The course Software Carpentry is aimed specifically at people doing scientific computing and aims to teach the basics and lessons of software engineering, and how best to apply them to projects.
It covers topics like version control, debugging, testing, scripting and various other issues.
I've listened to about 8 or 9 of the lectures and think it is to be highly recommended.
Edit: The MP3s of the lectures are available as well.
Nuclear/particle physics here.
Major programing work used to be done mostly in Fortran using CERNLIB (PAW, MINUIT, ...) and GEANT3, recently it has mostly been done in C++ with ROOT and Geant4. There are a number of other libraries and tools in specialized use, and LabVIEW sees some use here and there.
Data acquisition in my end of this business has often meant fairly low level work. Often in C, sometimes even in assembly, but this is dying out as the hardware gets more capable. On the other hand, many of the boards are now built with FPGAs which need gate twiddling...
One-offs, graphical interfaces, etc. use almost anything (Tcl/Tk used to be big, and I've been seeing more Perl/Tk and Python/Tk lately) including a number of packages that exist mostly inside the particle physics community.
Many people writing code have little or no formal training, and process is transmitted very unevenly by oral tradition, but most of the software group leaders take process seriously and read as much as necessary to make up their deficiencies in this area.
Version control for the main tools is ubiquitous. But many individual programmers neglect it for their smaller tasks. Formal bug tracking tools are less common, as are nightly builds, unit testing, and regression tests.
To improve things:
Get on the good side of the local software leaders
Implement the process you want to use in your own area, and encourage those you let in to use it too.
Wait. Physicists are empirical people. If it helps, they will (eventually!) notice.
One more suggestion for improving things.
Put a little time in to helping anyone you work directly with. Review their code. Tell them about algorithmic complexity/code generation/DRY or whatever basic thing they never learned because some professor threw a Fortran book at them once and said "make it work". Indoctrinate them on process issues. They are smart people, and they will learn if you give them a chance.
This might be slightly tangential, but hopefully relevant.
I used to work for National Instruments, R&D, where I wrote software for NI RF & Communication toolkits. We used LabVIEW quite a bit, and here are the practices we followed:
Source control. NI uses Perforce. We did the regular thing - dev/trunk branches, continuous integration, the works.
We wrote automated test suites.
We had a few people who came in with a background in signal processing and communication. We used to have regular code reviews, and best practices documents to make sure their code was up to the mark.
Despite the code reviews, there were a few occasions when "software guys", like me had to rewrite some of this code for efficiency.
I know exactly what you mean about stubborn people! We had folks who used to think that pointing out a potential performance improvement in their code was a direct personal insult! It goes without saying that that this calls for good management. I thought the best way to deal with these folks is to go slowly, not press to hard for changes and if necessary be prepared to do the dirty work. [Example: write a test suite for their code].
I'm not exactly a 'natural' scientist (I study transportation) but am an academic who writes a lot of my own software for data analysis. I try to write as much as I can in Python, but sometimes I'm forced to use other languages when I'm working on extending or customizing an existing software tool. There is very little programming training in my field. Most folks are either self-taught, or learned their programming skills from classes taken previously or outside the discipline.
I'm a big fan of version control. I used Vault running on my home server for all the code for my dissertation. Right now I'm trying to get the department to set up a Subversion server, but my guess is I will be the only one who uses it, at least at first. I've played around a bit with FogBugs, but unlike version control, I don't think that's nearly as useful for a one-man team.
As for encouraging others to use version control and the like, that's really the problem I'm facing now. I'm planning on forcing my grad students to use it on research projects they're doing for me, and encouraging them to use it for their own research. If I teach a class involving programming, I'll probably force the students to use version control there too (grading them on what's in the repository). As far as my colleagues and their grad students go, all I can really do is make a server available and rely on gentle persuasion and setting a good example. Frankly, at this point I think it's more important to get them doing regular backups than get them on source control (some folks are carrying around the only copy of their research data on USB flash drives).
1.) Scripting languages are popular these days for most things due to better hardware. Perl/Python/Lisp are prevalent for lightweight applications (automation, light computation); I see a lot of Perl at my work (computational EM) since we like Unix/Linux. For performance stuff, C/C++/Fortran are typically used. For parallel computing, well, we usually manually parallelize runs in EM as opposed to having a program implicitly do it (ie split up the jobs by look angle when computing radar cross sections).
2.) We just kind of throw people into the mix here. A lot of the code we have is very messy, but scientists are typically a scatterbrained bunch that don't mind that sort of thing. Not ideal, but we have things to deliver and we're severely understaffed. We're slowly getting better.
3.) We use SVN; however, we do not have bug tracking software. About as good as it gets for us is a txt file that tells you where bugs specific bugs are.
4.) My suggestion for implementing best practices for scientists: do it slowly. As scientists, we typically don't ship products. No one in science makes a name for himself by having clean, maintainable code. They get recognition from the results of that code, typically. They need to see justification for spending time on learning software practices. Slowly introduce new concepts and try to get them to follow; they're scientists, so after their own empirical evidence confirms the usefulness of things like version control, they will begin to use it all the time!
I'd highly recommend reading "What Every Computer Scientist Should Know About Floating-Point Arithmetic". A lot of problems I encounter on a regular basis come from issues with floating point programming.
I am a physicist working in the field of condensed matter physics, building classical and quantum models.
Languages:
C++ -- very versatile: can be used for anything, good speed, but it can be a bit inconvenient when it comes to MPI
Octave -- good for some supplementary calculations, very convenient and productive
Libraries:
Armadillo/Blitz++ -- fast array/matrix/cube abstractions for C++
Eigen/Armadillo -- linear algebra
GSL -- to use with C
LAPACK/BLAS/ATLAS -- extremely big and fast, but less convenient (and written in FORTRAN)
Graphics:
GNUPlot -- it has very clean and neat output, but not that productive sometimes
Origin -- very convenient for plotting
Development tools:
Vim + plugins -- it works great for me
GDB -- a great debugging tool when working with C/C++
Code::Blocks -- I used it for some time and found it quite comfortable, but Vim is still better in my opinion.
I work as a physicist in a UK university.
Perhaps I should emphasise that different areas of research have different emphasis on programming. Particle physicists (like dmckee) do computational modelling almost exclusively and may collaborate on large software projects, whereas people in fields like my own (condensed matter) write code relatively infrequently. I suspect most scientists fall into the latter camp. I would say coding skills are usually seen as useful in physics, but not essential, much like physics/maths skills are seen as useful for programmers but not essential. With this in mind...
What languages/environments have you used for developing scientific software, esp. data analysis? What libraries? (E.g., what do you use for plotting?)
Commonly data analysis and plotting is done using generic data analysis packages such as IGOR Pro, ORIGIN, Kaleidegraph which can be thought of as 'Excel plus'. These packages typically have a scripting language that can be used to automate. More specialist analysis may have a dedicated utility for the job that generally will have been written a long time ago, no-one has the source for and is pretty buggy. Some more techie types might use the languages that have been mentioned (Python, R, MatLab with Gnuplot for plotting).
Control software is commonly done in LabVIEW, although we actually use Delphi which is somewhat unusual.
Was there any training for people without any significant background in programming?
I've been to seminars on grid computing, 3D visualisation, learning Boost etc. given by both universities I've been at. As an undergraduate we were taught VBA for Excel and MatLab but C/MatLab/LabVIEW is more common.
Did you have anything like version control, bug tracking?
No, although people do have personal development setups. Our code base is in a shared folder on a 'server' which is kept current with a synching tool.
How would you go about trying to create a decent environment for programming, without getting too much in the way of the individual scientists (esp. physicists are stubborn people!)
One step at a time! I am trying to replace the shared folder with something a bit more solid, perhaps finding a SVN client which mimics the current synching tools behaviour would help.
I'd say though on the whole, for most natural science projects, time is generally better spent doing research!
Ex-academic physicist and now industrial physicist UK here:
What languages/environments have you used for developing scientific software, esp. data analysis? What libraries? (E.g., what do you use for plotting?)
I mainly use MATLAB these days (easy to access visualisation functions and maths). I used to use Fortran a lot and IDL. I have used C (but I'm more a reader than a writer of C), Excel macros (ugly and confusing). I'm currently needing to be able to read Java and C++ (but I can't really program in them) and I've hacked Python as well. For my own entertainment I'm now doing some programming in C# (mainly to get portability / low cost / pretty interfaces). I can write Fortran with pretty much any language I'm presented with ;-)
Was there any training for people without any significant background in programming?
Most (all?) undergraduate physics course will have a small programming course usually on C, Fortran or MATLAB but it's the real basics. I'd really like to have had some training in software engineering at some point (revision control / testing / designing medium scale systems)
Did you have anything like version control, bug tracking?
I started using Subversion / TortoiseSVN relatively recently. Groups I've worked with in the past have used revision control. I don't know any academic group which uses formal bug tracking software. I still don't use any sort of systematic testing.
How would you go about trying to create a decent environment for programming, without getting too much in the way of the individual scientists (esp. physicists are stubborn people!)
I would try to introduce some software engineering ideas at undergraduate level and then reinforce them by practice at graduate level, also provide pointers to resources like the Software Carpentry course mentioned above.
I'd expect that a significant fraction of academic physicists will be writing software (not necessarily all though) and they are in dire need of at least an introduction to ideas in software engineering.
What languages/environments have you used for developing scientific software, esp. data analysis? What libraries? (E.g., what do you use for plotting?)
Python, NumPy and pylab (plotting).
Was there any training for people without any significant background in programming?
No, but I was working in a multimedia research lab, so almost everybody had a computer science background.
Did you have anything like version control, bug tracking?
Yes, Subversion for version control, Trac for bug tracing and wiki. You can get free bug tracker/version control hosting from http://www.assembla.com/ if their TOS fits your project.
How would you go about trying to create a decent environment for programming, without getting too much in the way of the individual scientists (esp. physicists are stubborn people!).
Make sure the infrastructure is set up and well maintained and try to sell the benefits of source control.
I'm a statistician at a university in the UK. Generally people here use R for data analysis, it's fairly easy to learn if you know C/Perl. Its real power is in the way you can import and modify data interactively. It's very easy to take a number of say CSV (or Excel) files and merge them, create new columns based on others and then throw that into a GLM, GAM or some other model. Plotting is trivial too and doesn't require knowledge of a whole new language (like PGPLOT or GNUPLOT.) Of course, you also have the advantage of having a bunch of built-in features (from simple things like mean, standard deviation etc all the way to neural networks, splines and GL plotting.)
Having said this, there are a couple of issues. With very large datasets R can become very slow (I've only really seen this with >50,000x30 datasets) and since it's interpreted you don't get the advantage of Fortran/C in this respect. But, you can (very easily) get R to call C and Fortran shared libraries (either from something like netlib or ones you've written yourself.) So, a usual workflow would be to:
Work out what to do.
Prototype the code in R.
Run some preliminary analyses.
Re-write the slow code into C or Fortran and call that from R.
Which works very well for me.
I'm one of the only people in my department (of >100 people) using version control (in my case using git with githuib.com.) This is rather worrying, but they just don't seem to be keen on trying it out and are content with passing zip files around (yuck.)
My suggestion would be to continue using LabView for the acquisition (and perhaps trying to get your co-workers to agree on a toolset for acquisition and making is available for all) and then move to exporting the data into a CSV (or similar) and doing the analysis in R. There's really very little point in re-inventing the wheel in this respect.
What languages/environments have you used for developing scientific software, esp. data analysis? What libraries? (E.g., what do you use for plotting?)
My undergraduate physics department taught LabVIEW classes and used it extensively in its research projects.
The other alternative is MATLAB, in which I have no experience. There are camps for either product; each has its own advantages/disadvantages. Depending on what kind of problems you need to solve, one package may be more preferable than the other.
Regarding data analysis, you can use whatever kind of number cruncher you want. Ideally, you can do the hard calculations in language X and format the output to plot nicely in Excel, Mathcad, Mathematica, or whatever the flavor du jour plotting system is. Don't expect standardization here.
Did you have anything like version control, bug tracking?
Looking back, we didn't, and it would have been easier for us all if we did. Nothing like breaking everything and struggling for hours to fix it!
Definitely use source control for any common code. Encourage individuals to write their code in a manner that could be made more generic. This is really just coding best practices. Really, you should have them teaching (or taking) a computer science class so they can get the basics.
How would you go about trying to create a decent environment for programming, without getting too much in the way of the individual scientists (esp. physicists are stubborn people!)
There is a clear split between data aquisition (DAQ) and data analysis. Meaning, it's possible to standardize on the DAQ and then allow the scientists to play with the data in the program of their choice.
Another good option is Scilab. It has graphic modules à la LabVIEW, it has its own programming language and you can also embed Fortran and C code, for example. It's being used in public and private sectors, including big industrial companies. And it's free.
About versioning, some prefer Mercurial, as it gives more liberties managing and defining the repositories. I have no experience with it, however.
For plotting I use Matplotlib. I will soon have to make animations, and I've seen good results using MEncoder. Here is an example including an audio track.
Finally, I suggest going modular, this is, trying to keep main pieces of code in different files, so code revision, understanding, maintenance and improvement will be easier. I have written, for example, a Python module for file integrity testing, another for image processing sequences, etc.
You should also consider developing with the use a debugger that allows you to check variable contents at settable breakpoints in the code, instead using print lines.
I have used Eclipse for Python and Fortran developing (although I got a false bug compiling a Fortran short program with it, but it may have been a bad configuration) and I'm starting to use the Eric IDE for Python. It allows you to debug, manage versioning with SVN, it has an embedded console, it can do refactoring with Bicycle Repair Man (it can use another one, too), you have Unittest, etc. A lighter alternative for Python is IDLE, included with Python since version 2.3.
As a few hints, I also suggest:
Not using single-character variables. When you want to search appearances, you will get results everywhere. Some argue that a decent IDE makes this easier, but then you will depend on having permanent access to the IDE. Even using ii, jj and kk can be enough, although this choice will depend on your language. (Double vowels would be less useful if code comments are made in Estonian, for instance).
Commenting the code from the very beginning.
For critical applications sometimes it's better to rely on older language/compiler versions (major releases), more stable and better debugged.
Of course you can have more optimized code in later versions, fixed bugs, etc, but I'm talking about using Fortran 95 instead of 2003, Python 2.5.4 instead of 3.0, or so. (Specially when a new version breaks backwards compatibility.) Lots of improvements usually introduce lots of bugs. Still, this will depend on specific application cases!
Note that this is a personal choice, many people could argue against this.
Use redundant and automated backup! (With versioning control).
Definitely, use Subversion to keep current, work-in-progress, and stable snapshot copies of source code. This includes C++, Java etc. for homegrown software tools, and quickie scripts for one-off processing.
With the strong leaning in science and applied engineering toward "lone cowboy" development methodology, the usual practice of organizing the repository into trunk, tag and whatever else it was - don't bother! Scientists and their lab technicians like to twirl knobs, wiggle electrodes and chase vacuum leaks. It's enough of a job to get everyone to agree to, say Python/NumPy or follow some naming convention; forget trying to make them follow arcane software developer practices and conventions.
For source code management, centralized systems such as Subversion are superior for scientific use due to the clear single point of truth (SPOT). Logging of changes and ability to recall versions of any file, without having chase down where to find something, has huge record-keeping advantages. Tools like Git and Monotone: oh my gosh the chaos I can imagine that would follow! Having clear-cut records of just what version of hack-job scripts were used while toying with the new sensor when that Higgs boson went by or that supernova blew up, will lead to happiness.
What languages/environments have you
used for developing scientific
software, esp. data analysis? What
libraries? (E.g., what do you use for
plotting?)
Languages I have used for numerics and sicentific-related stuff:
C (slow development, too much debugging, almost impossible to write reusable code)
C++ (and I learned to hate it -- development isn't as slow as C, but can be a pain. Templates and classes were cool initially, but after a while I realized that I was fighting them all the time and finding workarounds for language design problems
Common Lisp, which was OK, but not widely used fo Sci computing. Not easy to integrate with C (if compared to other languages), but works
Scheme. This one became my personal choice.
My editor is Emacs, although I do use vim for quick stuff like editing configuration files.
For plotting, I usually generate a text file and feed it into gnuplot.
For data analysis, I usually generate a text file and use GNU R.
I see lots of people here using FORTRAN (mostly 77, but some 90), lots of Java and some Python. I don't like those, so I don't use them.
Was there any training for people
without any significant background in
programming?
I think this doesn't apply to me, since I graduated in CS -- but where I work there is no formal training, but people (Engineers, Physicists, Mathematicians) do help each other.
Did you have anything like version
control, bug tracking?
Version control is absolutely important! I keep my code and data in three different machines, in two different sides of the world -- in Git repositories. I sync them all the time (so I have version control and backups!) I don't do bug control, although I may start doing that.
But my colleagues don't BTS or VCS at all.
How would you go about trying to
create a decent environment for
programming, without getting too much
in the way of the individual
scientists (esp. physicists are
stubborn people!)
First, I'd give them as much freedom as possible. (In the University where I work I could chooe between having someone install Ubuntu or Windows, or install my own OS -- I chose to install my own. I don't have support from them and I'm responsible for anything that happens with my machins, including security issues, but I do whatever I want with the machine).
Second, I'd see what they are used to, and make it work (need FORTRAN? We'll set it up. Need C++? No problem. Mathematica? OK, we'll buy a license). Then see how many of them would like to learn "additional tools" to help them be more productive (don't say "different" tools. Say "additional", so it won't seem like anyone will "lose" or "let go" or whatever). Start with editors, see if there are groups who would like to use VCS to sync their work (hey, you can stay home and send your code through SVN or GIT -- wouldn't that be great?) and so on.
Don't impose -- show examples of how cool these tools are. Make data analysis using R, and show them how easy it was. Show nice graphics, and explain how you've created them (but start with simple examples, so you can quickly explain them).
I would suggest F# as a potential candidate for performing science-related manipulations given its strong semantic ties to mathematical constructs.
Also, its support for units-of-measure, as written about here makes a lot of sense for ensuring proper translation between mathematical model and implementation source code.
First of all, I would definitely go with a scripting language to avoid having to explain a lot of extra things (for example manual memory management is - mostly - ok if you are writing low-level, performance sensitive stuff, but for somebody who just wants to use a computer as an upgraded scientific calculator it's definitely overkill). Also, look around if there is something specific for your domain (as is R for statistics). This has the advantage of already working with the concepts the users are familiar with and having specialized code for specific situations (for example calculating standard deviations, applying statistical tests, etc in the case of R).
If you wish to use a more generic scripting language, I would go with Python. Two things it has going for it are:
The interactive shell where you can experiment
Its clear (although sometimes lengthy) syntax
As an added advantage, it has libraries for most of the things you would want to do with it.
I'm no expert in this area, but I've always understood that this is what MATLAB was created for. There is a way to integrate MATLAB with SVN for source control as well.