Should you standardize on a scripting language in a dev group? - scripting

At work we write a small to moderate amount of scripts to aid us in normal development. We have some people that are more comfortable in python, some in perl, some in php, etc...
Sometimes I think it is best to let people work in a language they are most comfortable with. This can mean that sometimes people can do a better job on a script (as they know more tricks in one language). It can also lead to less development time per script.
Other times I think that we would benefit the most from standardizing so that there might be more shared libraries and so that we don't get into the situation of "I can't work on that script because I don't know python".
Do you think that we should standardize or let people choose for each script they write?

I would be inclined to let people choose, and hire people who are comfortable at learning new things. Gaining basic fluency with almost any language should be easy enough for a good developer. And for small scripts where the original author isn't far away, it's even easier.
The second part of the above is the hard part, of course. But you'll end up with a more flexible, more knowledgeable team.

I would advocate standardizing on a couple languages. "Thou shalt use either Python, or Perl, or Ruby. Not Rexx, nor PHP, nor NewBatch, nor aught other, for thy brethren ought to be able to read thine writing without undue despair or cutting of their skins".

My opinion is that it is just like any other development environment. Do you standardize on one development language? Why? The same should be true for your scripting environment. Not only do I lean toward standardizing on a specific language I think you should standardize all the same things that you are hopefully doing in your programming environment (naming conventions, coding style, etc). Of course there are counter arguments and there will occasionally be exceptions, but hopefully they would be few and for valid reasons.

Like everything there are upsides and downsides.
As a developer I dislike being limited by "official company standards." More often than not these rules tend to constrain and stifle.
I think what is most important is to always have two people who can both read and write a script language on staff so one person can go on vacation or be sick without holding up the works.
For major components of your system it is probably better to standardize on a single language - as much as it pains me to say so.

I think the right place to ask this question is with your own team.
Your team should form a consensus as to whether or not they want a common platform or whether they want the freedom to choose the right tool for the job. There is no single best answer to this question any more than there is a single best answer for "what's the best scripting language".
Certainly there are advantages to each approach. If every team member is free to choose their own language, they might be more productive and your business will reap the benefit. On the other hand, if someone writes a critical tool in a language only she knows and then something happens to her (illness, new job, etc) you can be stuck with a single point of failure that no one knows how to fix.

So long as the languages in question are used for scripting, I agree with other commenters that it should be left to the devteam, and different languages should be tolerated in most cases.
If the language is used for your main codebase, you had better standardize on one.
If some developers complain about a script not written in their language, encourage them to learn it, particularly if it's one of the widespread ones like Ruby, Perl, et al.

No.
Your dev environment is hopefully a living entity that changes and is cultivated. New abilities (scripting languages) should be able to be accommodated, and older places that haven't been visited for years will someday get obviated by tools, or revisited from time to time and an evaluation will take place. Hopefully the dev scripts are a minor amount of development and automation. Chances are the team will drift towards a set of standards (ant, python, etc).

Per Project, anything checked in should probably follow a rule of minimal complexity.
Your team will be gone some day, and someone else will have to come in and maintain this. Please don't make them learn 5 languages or they are going to look for your names in the source code an hunt you down.
When we've picked up hack projects like that, we've pretty much considered the creators complete fools and made fun of them. They couldn't bother themselves to learn one scripting language???
That said, if you don't check it in--who cares?

Related

What features are important in a programming language for young beginners? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I was talking with some of the mentors in a local robotics competition for 7th and 8th level kids. The robot was using PBASIC and the parallax Basic Stamp. One of the major issues was this was short term project that required building the robot, teaching them to program in PBASIC and having them program the robot. All in only 2 hours or so a week over a couple months. PBASIC is kinda nice in that it has built in features to do everything, but information overload is possible to due this.
My thought are simplicity is key.
When you have kids struggling to grasp:
if X>10 then <DOSOMETHING>
There is not much point in throwing "proper" object oriented programming at them.
What are the essentials needed to foster an interest in programming?
Edit:
I like the notion of interpreted on the PC as learning tool. Due to the target platforms more than likely being somewhat resource constrained, I would like to target languages that are appropriate for embedded work. (Python and even Lua require more resources than the target likely to have. And I actually kinda like Lua.) I suppose that is one of the few virtues BASIC has, it has been ran on systems with less than 4K for over 30 years. C may not be a bad option if there are some "friendly" tools available such as Ch.
The most important is not a lot of boiler plate to make the simplest program run.
If you start of with a bunch of
import Supercalifragilistic from <expialidocious>
public void inherited security model=<apartment>
public : main .....
And tell them they "not to worry they aren't supposed to understand that" - you are going to put off both the brightest and the dumbest.
The nice thing about python is that printing "hello world" is print "hello world"
Fun, quick results. Capture the attention span of the kid.
Interpretive shells like most scripting languages offer (command line) that lets the student just type 1 or 2 liners is a big deal.
python:
>>> 1+1
2
Boom, instant feedback, kid thinks "the computer is talking back". Kids love that. Remember Eliza, anyone?
If they get bogged down in installing an IDE, creating a project, bleh bleh bleh, sometimes the tangents will take you away from the main topic.
BASIC is good too.
Look for some things online like "SIMPLE" : http://www.simplecodeworks.com/website.html
A team of researchers, beginning at Rice, then spreading out to Brown, Chicago, Northeastern, Northwestern, and Utah, have been studying this question for about 15 years. I can't summarize all their discoveries here, but here are some of their most important findings:
Irregular syntax can be a barrier to entry.
The language should be divided into concentric subsets, and you should choose a subset appropriate to the student's level of knowledge. For example, their smallest subset is called the "Beginning Student" language.
The compiler's error messages should be matched to the students' level of knowledge. If you are using subsets, different subsets might give different messages for the same error.
Beginners find it difficult to learn the phase distinction: separate phases for type checking and run time, with different kinds of errors. For this reason, beginners do better with a language where types are checked at run time, i.e., a dynamically typed language.
Beginners find it difficult to reason about mutable variables and mutable objects. If you teach pure functional programming, by contrast, you can leverage students' experience with high-school and middle-school algebra.
Beginning students are more engaged by an interactive programming environment than by the old edit-compile-link-go model.
Beginning students are engaged by splash and by interactivity. It's good if your language's standard library provides built-in support for creating and displaying images. It's better if those images are supported within the interactive programming environment, instead of requiring a separate player or viewer. And it's even better if your standard library can support moving images, or some other kind of animation.
Interestingly, they have got very good results with just 2D images. Even though we are all surrounded by examples of 3D computer graphics, students seem to get very engaged working with just two-dimensional images.
These results have been obtained primarily with college students, and they have been replicated at over 20 universities. However, the research team has also done some work with high-school and middle-school students. The first papers on that work are just coming out, so I'm less aware of the new findings and am not able to summarize them.
When you have kids struggling to grasp:
if X>10 then <DOSOMETHING>
Maybe it's a sign they shouldn't be doing programming?
What are the essentials needed to foster an interest in programming?
To see success with no or little effort. To create something running in a matter of minutes. A lot of programming languages can offer it, including the scary C++.
In order to avoid complication with #includes, multiple source files, modularization and compilation, why not have a look elsewhere? Try to write some Excel macros or use any other software with some basic built-in scripting language to automate certain tasks?
Another idea could be to play with web pages. It is not exactly programming, but at least easy to achieve something and show to others with pride.
This has been said on SO before, but... try Scratch. It's an incredible learning tool for kids. It teaches the basics of programming concepts in a hands-on and language-independent way. After a bit of playing around with it they can get down to learning a specific language's implementation of the concepts they already understand.
The common theme in languages that are easy for beginners - especially children to pick up is that there's very little barrier to entry, and immediate feedback. If "hello world" doesn't look a lot like print "Hello, world!", it's going to be harder for people to pick up. The following features along those lines come to mind:
Interpreted, or incrementally JIT compiled (which looks like an interpreter to the user)
No boilerplate
No attempt to enforce a specific programming style (e.g. Java requiring that everything be in a class definition, or Haskell enforcing purely functional design)
Dynamic typing
Implicit coercion (maybe)
A REPL
Breaking the problem (read program) down into a small set of sections (modules) that do one thing and do it very well.
You have to get them to stop thinking like a user and start thinking like a programmer. They need to take it one step at a time. Ask them what they have to think of in order to figure out the problem them selves and then write them down as steps. If you can then you break each step even more in the same mater. When done you will have the program in english making it simpler to program for real.
I did this with a friend that just could not get it and now he can. He used to look at something that I did and be bewildered and I would say that he has done more complex stuff than this.
One of the more persistently-present arguments I have had with other programmers is whether or not one's first language should require explicit type languages. Many are of the opinion that learning a language which requires you to explicitly declare type information is one which will teach you to program typefully. Conversely, it can be said that dynamic languages might present a less demanding learning curve. It goes either way, I suppose.
My advice: start with a simple model of how a computer works. I am particular to stack machines as good tools for teaching computation.
Remember that beginners are learning two disciplines at the same time: how computers work and the abstract logic involved (the basics of Computer Science), plus how to write programs that match their intended logic (learning a specific language's syntax and idioms). You have to address both concerns in an interwoven fashion in order for the students to quickly become effective. This is also the reason experienced programmers can often pick up new languages quickly.
It's worth noting Python grew out of a project for a language named ABC, which was targeted at beginners. For example, the required colon isn't strictly required, but was found to improve readability:
if some_condition:
do_this()
I got 3 words : Karel the Robot.
it's a really really simple 'language' that is designed to teach people the basis of programming :
Look for it on the web. You can look at this, though I never tried it :
http://karel.sourceforge.net/
While this isn't related to programming a robot, I think web programming is a great place to start with kids that age. It's how I started at that exact age. It easily translates to something kids understand if they use the web at all. Start with HTML, throw in Javascript, and soon they want to be doing features requiring server-side scripting or some sort, and things progress from there.
With the kind of kids who are already interested in robotics, though, I'd actually go for a different language like the ones already described. If you want to work in a field like robotics, you don't need to be convinced to try something hard. You need to be challenged.
A few years ago I saw a presentation at Ignite! Seattle from one of the people working on the project now known as Kodu who envisioned as a programming language for children. He spent time talking about what common language features could simply be thrown out in a programming environment meant to teach fundamentals.
A lot of cherished imperative constructs, like C-style for loops, were simply left out in favor of a simple object-messaging approach. Object-oriented programming isn't hard to understand when you think about "objects" and "messages"; the hard part is when you deal with things that programmers, but not children, care about, like inheritance and contracts and sweeping abstractions. I've got this thing (noun), now act on it (verb), in this way (adverb like quickly), when thing (sees/bumps into) something (with some attribute) (that's your if). Events are really conditions, and have all of the power of conditions, but it's up to the runtime to identify when those events happen.
This kind of event and messaging driven approach probably translates even better to robots than procedural programming would, anyway, so it might be a good way to look at the problem. Try not to think about what you'd "need" to know to program in C or Pascal or something; think about what you'd want to be able to make something do.

When its enough for a programming language that you need to switch to another?

I have wonder that many big applications (e.g. social websites such as facebook) are build with many languages into its platform.
They usually start with AJAX browser support, then scale down to PHP scripting, then move towards a powrful OOP technologie such as Java or .NET, and finally a primitive language to increase performance in crucial operations such as C.
My question is how should I determinate the edge of the layers between languages. When PHP, when Java, when C and so on. And the other question is if should those languages integrate in a vertcal fashion for simplicity and maintanance, or could it be cases when you decide to program on module of your app in Java and the other in native C.
What are the context variables that push me to move to a better performance language? (e.g. concurrency issues due increase of users)
Don't tell me that PHP overlaps .NET and Java Technologies. In a starter point it does, but when the network is overload you start seeing the diferences. I mean how can I achieve Multithreading in PHP as in Java with the same performance. The thing it's hard to answer my wuestion is becasue there is not so much reading about this. You maybe find some good books covering PHP, but few telling how when and why integrate different languages.
Each language was created for different purposes, Python is strong with string operations, Perl very powerful in batch scripting, PHP a very reliable application web server, C the mother of most popular languages.
Best,
Demian.
On one end of the scale, you move to a higher performance language whenever your profiling and measurements tell you that you have a bottleneck that can't be fixed with better algorithms, data structures, or other optimisation.
At the other end, you move to a higher level language (ie. more abstraction, better libraries) whenever your management allow you to do so. ;)
I believe most teams simply use what they are best familiar with.
There are also questions of licensing that can influence the decision.
That is, if you're talking about technologies that compare to each other and solve the problem on the same level (for example ASP.NET/JSF/JSP/PHP...). But you can't compare .NET with C++ for example, they are meant to solve different problems on different abstraction levels.
My criterion for any programming language is "does it help me to get the job done or does it just get in the way?" If the latter, then it's time to move on.
From an economical point of view the answer is easy: on a regular basis just look what will be cheaper. Either continue with the current technology and maybe stretch the envelope a bit more. Or switch to something new. When you compare the two alternatives the cost of the investment already done is not important anymore since you've already spent that money/effort. You only have to look ahead: cost of licenses, education, etc.
Of course this is easier said then done, but just sitting down with a few people, thinking about it, and maybe try to come up with some numbers already helps a lot. I have seen too many projects that continued with technology that really wasn't suited for the job anymore.
Also hard numbers don't tell the whole story. There will be resistance because of unfamiliar technology, experts who are losing their status, etc.
Identify the bottleneck
Solve bottleneck
Go to 1
I'm sure you can imagine that step 2 is the one where decisions like "What programming language do we use" and "where do we put the coffee machine" come into play. That's the basic rule.

A dynamic language to learn for curiosity's sake

This is sort of a "best language" question, but hopefully with enough of a twist to make it worthwhile.
As someone who only uses C and C#, I'm curious to learn a dynamic language to expand my knowledge. I don't know which to choose.
The thing is that my motivation isn't necessarily to create any "real world" projects, or projects that integrate with other systems, but rather just to learn.
With that said, for someone only familiar languages such as those I mentioned, and possibly ignoring obscurity and lack of support..
Which dynamic language would be the biggest departure?
Which would introduce the most novel concepts?
Which is the exemplar of dynamic languages?
I would suggest learning IronPython. As a language it will still be a significant departure for you, but you'll be able to use everything in the .NET framework that you're familiar with. (I usually think it's a good idea to try to vary just one aspect of development radically at a time... work your way through the different aspects one at a time, and you'll always be comfortable with part of what you're doing, which will help you learn the new part more quickly, IMO.)
Also, with C# 4 you'll be able to call into IronPython from your C# code, including using its dynamic features that way.
The functional languages (LISP, Scheme, etc.) are always worth checking out. They may be some of the bigger departures.
JavaScript is a great stepping stone to go from the C arena to the functional arena. From there you can mess around with JQuery, which, although not a language, forces you to do things in non procedural ways.
Another often overlooked language is SQL. It's obviously a niche language, and as Josh points out, not really 'dynamic', but acquiring a deep understanding of they way set based languages work can really progress a coder.
Careful, if you 'Learn' to much you may end up frustrated with the older languages.
Which dynamic language would be the biggest departure?
Which would introduce the most novel concepts?
I guess that would include Scheme, Erlang and Oz
Which is the exemplar of dynamic languages?
I'd say Ruby and Python
I would suggest any Lisp dialect or Smalltalk. These are dynamic and had heavy influence on the design of other, more mainstream languages.
They also include interesting concepts that are not found in other languages.
Another interesting dynamic language to have a look at is Lua.
It is hard to say, it is definitely a matter of personal taste in a lot of ways. I like learning Python but I am sure that you could learn just as many good things from Ruby about a dynamically typed language.
If you are used to C and C# then any dynamically typed language is going to be a departure. So I say you should use Python because that is what I like, and hopefully you will like it too. If you start using it and you hate it then try something else (like Ruby, Perl, PHP, etc.).
I would say that Lisp fits most, if not all, of your criteria. It's definitely a big departure from C/C++ and C#. It has got alot of novel concepts, and many would argue that it's hard to find a more dynamic language.
Barring Lisp, I myself would go for Ruby.
I'm going to have to vote for Common Lisp here. It is a highly dynamic language that can be adapted to just about anything. You get not only functional programming, but also OO, and even procedural if you so desire. And macros in Lisp are very interesting to study, since to my knowledge no other language has its equivalent.
Plus, developing in a functional style tends to help development in other languages as well. For example, I've noticed that I do OO primarily with immutable objects, thanks to concepts influenced by Lisp and Scheme. And with this, I've noticed an improvement in the stability and maintainability of my OO apps. Just my two cents.
I've been a C++ and C# developer for a long time, and recently started experimenting and learning other languages. I played with Ruby for little while and like it, but it wasn't what I wanted.
I ended up choosing Erlang. After reading about Erlang, I've decided that I really wanted to learn it. I'm not learning Erlang with any hopes of getting a job writing Erlang code. I'm learning Erlang only to become a better developer.
I really do like this language so far. It's only been about a month, and the syntax still gets me sometimes, but I can really see the power of the pattern matching and get excited to write it again. I struggled with the concept of everything being non-mutable at first. But this was mostly because I've "grown up" on C# and C++. C# is a great language and has some amazing tools, but you really have fun with some other languages, particularly something like Erlang. Just don't expect to land a job as a full time Erlang developer. (At least not yet).
For anyone curious, my hobby project is multiple player iPhone app connecting to an Erlang server. For a Windows developer, this has been a major change. But it has renewed my passion for programming, which really was my goal.
If you really want to go crazy but want things to at least be slightly familiar (I know that sounds like a contradiction but it's true), look at F#. It's a type-inferred language but it supports a lot of dynamic type properties. It's a functional language built on top of the CLR so you get full use of the .Net object system which is cool. Because it's a functional language, there are enough novel concepts to really work your brain.
If you really want to go for "biggest departure", Clojure might be of interest. It's a Lisp dialect built on the JVM. It's getting some pretty serious attention both in the Java and Lisp world. It might suit your purposes.

What programming things should every SysAdmin know?

This is a mirror-image of this question here:
https://serverfault.com/questions/4176/what-sysadmin-things-should-every-programmer-know
The original was posted by Nathan DeWitt. Credit should go to him. If you have a serverfault account, please visit his original question and bump it to show appreciation for this inquiry.
To paraphrase Nathan:
As a SysAdmin, we tend to take programmers for granted. The few times I've been without a good programmer have really made me appreciate what you guys do. When we're venturing into an environment without a programmer, what words of wisdom can you offer us?
Programmers need access to internet resources (such as StackOverflow). Don't block our access kthnx.
Sysadmins should know stuff that improves their efficiency.
That includes every feature of the shell they are working in (bash is pretty popular these days). There are many very usefull things in the manual that are worth looking into (from keyboard shortcuts through arrays to regexp).
It is very sad to see a junior sysadmin trying to manually log into 200 servers and switch a line in some file in /etc
If I would recriut an admin, I would demand an advanced (but not expert) knowledge of a programming language. It is not important what language is it, as He can easily switch to something else, but if he hasn't ever coded anything, he will not know how much (precious) time he is wasting manually doing things that can be done with a few lines of code.
Admins tend to like bash, php, python, awk and perl. Any of them will at least tripple their productivity on repetitive tasks.
On the other hand, coders should know httpd and loadbalancer's configuration or they will reinvent the wheel.
I would argue that a system administrator is a kind of programmer. So they should be able to think about programming; use scripting languages effectively, and write clear, well documented programs in them; have some understanding of how a programming project works.
Ideally, they should read up on agile methods, because quick development of small, changeable programs is the usual world they live in. They should understand testing, and if they use a modern scripting language, they should understand unit testing tools, some variant of xUnit.
Perl.
Definitely.

Practices for programming in a scientific environment? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
Background
Last year, I did an internship in a physics research group at a university. In this group, we mostly used LabVIEW to write programs for controlling our setups, doing data acquisition and analyzing our data. For the first two purposes, that works quite OK, but for data analysis, it's a real pain. On top of that, everyone was mostly self-taught, so code that was written was generally quite a mess (no wonder that every PhD quickly decided to rewrite everything from scratch). Version control was unknown, and impossible to set up because of strict software and network regulations from the IT department.
Now, things actually worked out surprisingly OK, but how do people in the natural sciences do their software development?
Questions
Some concrete questions:
What languages/environments have you used for developing scientific software, especially data analysis? What libraries? (for example, what do you use for plotting?)
Was there any training for people without any significant background in programming?
Did you have anything like version control, and bug tracking?
How would you go about trying to create a decent environment for programming, without getting too much in the way of the individual scientists (especially physicists are stubborn people!)
Summary of answers thus far
The answers (or my interpretation of them) thus far: (2008-10-11)
Languages/packages that seem to be the most widely used:
LabVIEW
Python
with SciPy, NumPy, PyLab, etc. (See also Brandon's reply for downloads and links)
C/C++
MATLAB
Version control is used by nearly all respondents; bug tracking and other processes are much less common.
The Software Carpentry course is a good way to teach programming and development techniques to scientists.
How to improve things?
Don't force people to follow strict protocols.
Set up an environment yourself, and show the benefits to others. Help them to start working with version control, bug tracking, etc. themselves.
Reviewing other people's code can help, but be aware that not everyone may appreciate that.
What languages/environments have you used for developing scientific software, esp. data analysis? What libraries? (E.g., what do you use for plotting?)
I used to work for Enthought, the primary corporate sponsor of SciPy. We collaborated with scientists from the companies that contracted Enthought for custom software development. Python/SciPy seemed to be a comfortable environment for scientists. It's much less intimidating to get started with than say C++ or Java if you're a scientist without a software background.
The Enthought Python Distribution comes with all the scientific computing libraries including analysis, plotting, 3D visualation, etc.
Was there any training for people without any significant background in programming?
Enthought does offer SciPy training and the SciPy community is pretty good about answering questions on the mailing lists.
Did you have anything like version control, bug tracking?
Yes, and yes (Subversion and Trac). Since we were working collaboratively with the scientists (and typically remotely from them), version control and bug tracking were essential. It took some coaching to get some scientists to internalize the benefits of version control.
How would you go about trying to create a decent environment for programming, without getting too much in the way of the individual scientists (esp. physicists are stubborn people!)
Make sure they are familiarized with the tool chain. It takes an investment up front, but it will make them feel less inclined to reject it in favor of something more familiar (Excel). When the tools fail them (and they will), make sure they have a place to go for help — mailing lists, user groups, other scientists and software developers in the organization. The more help there is to get them back to doing physics the better.
The course Software Carpentry is aimed specifically at people doing scientific computing and aims to teach the basics and lessons of software engineering, and how best to apply them to projects.
It covers topics like version control, debugging, testing, scripting and various other issues.
I've listened to about 8 or 9 of the lectures and think it is to be highly recommended.
Edit: The MP3s of the lectures are available as well.
Nuclear/particle physics here.
Major programing work used to be done mostly in Fortran using CERNLIB (PAW, MINUIT, ...) and GEANT3, recently it has mostly been done in C++ with ROOT and Geant4. There are a number of other libraries and tools in specialized use, and LabVIEW sees some use here and there.
Data acquisition in my end of this business has often meant fairly low level work. Often in C, sometimes even in assembly, but this is dying out as the hardware gets more capable. On the other hand, many of the boards are now built with FPGAs which need gate twiddling...
One-offs, graphical interfaces, etc. use almost anything (Tcl/Tk used to be big, and I've been seeing more Perl/Tk and Python/Tk lately) including a number of packages that exist mostly inside the particle physics community.
Many people writing code have little or no formal training, and process is transmitted very unevenly by oral tradition, but most of the software group leaders take process seriously and read as much as necessary to make up their deficiencies in this area.
Version control for the main tools is ubiquitous. But many individual programmers neglect it for their smaller tasks. Formal bug tracking tools are less common, as are nightly builds, unit testing, and regression tests.
To improve things:
Get on the good side of the local software leaders
Implement the process you want to use in your own area, and encourage those you let in to use it too.
Wait. Physicists are empirical people. If it helps, they will (eventually!) notice.
One more suggestion for improving things.
Put a little time in to helping anyone you work directly with. Review their code. Tell them about algorithmic complexity/code generation/DRY or whatever basic thing they never learned because some professor threw a Fortran book at them once and said "make it work". Indoctrinate them on process issues. They are smart people, and they will learn if you give them a chance.
This might be slightly tangential, but hopefully relevant.
I used to work for National Instruments, R&D, where I wrote software for NI RF & Communication toolkits. We used LabVIEW quite a bit, and here are the practices we followed:
Source control. NI uses Perforce. We did the regular thing - dev/trunk branches, continuous integration, the works.
We wrote automated test suites.
We had a few people who came in with a background in signal processing and communication. We used to have regular code reviews, and best practices documents to make sure their code was up to the mark.
Despite the code reviews, there were a few occasions when "software guys", like me had to rewrite some of this code for efficiency.
I know exactly what you mean about stubborn people! We had folks who used to think that pointing out a potential performance improvement in their code was a direct personal insult! It goes without saying that that this calls for good management. I thought the best way to deal with these folks is to go slowly, not press to hard for changes and if necessary be prepared to do the dirty work. [Example: write a test suite for their code].
I'm not exactly a 'natural' scientist (I study transportation) but am an academic who writes a lot of my own software for data analysis. I try to write as much as I can in Python, but sometimes I'm forced to use other languages when I'm working on extending or customizing an existing software tool. There is very little programming training in my field. Most folks are either self-taught, or learned their programming skills from classes taken previously or outside the discipline.
I'm a big fan of version control. I used Vault running on my home server for all the code for my dissertation. Right now I'm trying to get the department to set up a Subversion server, but my guess is I will be the only one who uses it, at least at first. I've played around a bit with FogBugs, but unlike version control, I don't think that's nearly as useful for a one-man team.
As for encouraging others to use version control and the like, that's really the problem I'm facing now. I'm planning on forcing my grad students to use it on research projects they're doing for me, and encouraging them to use it for their own research. If I teach a class involving programming, I'll probably force the students to use version control there too (grading them on what's in the repository). As far as my colleagues and their grad students go, all I can really do is make a server available and rely on gentle persuasion and setting a good example. Frankly, at this point I think it's more important to get them doing regular backups than get them on source control (some folks are carrying around the only copy of their research data on USB flash drives).
1.) Scripting languages are popular these days for most things due to better hardware. Perl/Python/Lisp are prevalent for lightweight applications (automation, light computation); I see a lot of Perl at my work (computational EM) since we like Unix/Linux. For performance stuff, C/C++/Fortran are typically used. For parallel computing, well, we usually manually parallelize runs in EM as opposed to having a program implicitly do it (ie split up the jobs by look angle when computing radar cross sections).
2.) We just kind of throw people into the mix here. A lot of the code we have is very messy, but scientists are typically a scatterbrained bunch that don't mind that sort of thing. Not ideal, but we have things to deliver and we're severely understaffed. We're slowly getting better.
3.) We use SVN; however, we do not have bug tracking software. About as good as it gets for us is a txt file that tells you where bugs specific bugs are.
4.) My suggestion for implementing best practices for scientists: do it slowly. As scientists, we typically don't ship products. No one in science makes a name for himself by having clean, maintainable code. They get recognition from the results of that code, typically. They need to see justification for spending time on learning software practices. Slowly introduce new concepts and try to get them to follow; they're scientists, so after their own empirical evidence confirms the usefulness of things like version control, they will begin to use it all the time!
I'd highly recommend reading "What Every Computer Scientist Should Know About Floating-Point Arithmetic". A lot of problems I encounter on a regular basis come from issues with floating point programming.
I am a physicist working in the field of condensed matter physics, building classical and quantum models.
Languages:
C++ -- very versatile: can be used for anything, good speed, but it can be a bit inconvenient when it comes to MPI
Octave -- good for some supplementary calculations, very convenient and productive
Libraries:
Armadillo/Blitz++ -- fast array/matrix/cube abstractions for C++
Eigen/Armadillo -- linear algebra
GSL -- to use with C
LAPACK/BLAS/ATLAS -- extremely big and fast, but less convenient (and written in FORTRAN)
Graphics:
GNUPlot -- it has very clean and neat output, but not that productive sometimes
Origin -- very convenient for plotting
Development tools:
Vim + plugins -- it works great for me
GDB -- a great debugging tool when working with C/C++
Code::Blocks -- I used it for some time and found it quite comfortable, but Vim is still better in my opinion.
I work as a physicist in a UK university.
Perhaps I should emphasise that different areas of research have different emphasis on programming. Particle physicists (like dmckee) do computational modelling almost exclusively and may collaborate on large software projects, whereas people in fields like my own (condensed matter) write code relatively infrequently. I suspect most scientists fall into the latter camp. I would say coding skills are usually seen as useful in physics, but not essential, much like physics/maths skills are seen as useful for programmers but not essential. With this in mind...
What languages/environments have you used for developing scientific software, esp. data analysis? What libraries? (E.g., what do you use for plotting?)
Commonly data analysis and plotting is done using generic data analysis packages such as IGOR Pro, ORIGIN, Kaleidegraph which can be thought of as 'Excel plus'. These packages typically have a scripting language that can be used to automate. More specialist analysis may have a dedicated utility for the job that generally will have been written a long time ago, no-one has the source for and is pretty buggy. Some more techie types might use the languages that have been mentioned (Python, R, MatLab with Gnuplot for plotting).
Control software is commonly done in LabVIEW, although we actually use Delphi which is somewhat unusual.
Was there any training for people without any significant background in programming?
I've been to seminars on grid computing, 3D visualisation, learning Boost etc. given by both universities I've been at. As an undergraduate we were taught VBA for Excel and MatLab but C/MatLab/LabVIEW is more common.
Did you have anything like version control, bug tracking?
No, although people do have personal development setups. Our code base is in a shared folder on a 'server' which is kept current with a synching tool.
How would you go about trying to create a decent environment for programming, without getting too much in the way of the individual scientists (esp. physicists are stubborn people!)
One step at a time! I am trying to replace the shared folder with something a bit more solid, perhaps finding a SVN client which mimics the current synching tools behaviour would help.
I'd say though on the whole, for most natural science projects, time is generally better spent doing research!
Ex-academic physicist and now industrial physicist UK here:
What languages/environments have you used for developing scientific software, esp. data analysis? What libraries? (E.g., what do you use for plotting?)
I mainly use MATLAB these days (easy to access visualisation functions and maths). I used to use Fortran a lot and IDL. I have used C (but I'm more a reader than a writer of C), Excel macros (ugly and confusing). I'm currently needing to be able to read Java and C++ (but I can't really program in them) and I've hacked Python as well. For my own entertainment I'm now doing some programming in C# (mainly to get portability / low cost / pretty interfaces). I can write Fortran with pretty much any language I'm presented with ;-)
Was there any training for people without any significant background in programming?
Most (all?) undergraduate physics course will have a small programming course usually on C, Fortran or MATLAB but it's the real basics. I'd really like to have had some training in software engineering at some point (revision control / testing / designing medium scale systems)
Did you have anything like version control, bug tracking?
I started using Subversion / TortoiseSVN relatively recently. Groups I've worked with in the past have used revision control. I don't know any academic group which uses formal bug tracking software. I still don't use any sort of systematic testing.
How would you go about trying to create a decent environment for programming, without getting too much in the way of the individual scientists (esp. physicists are stubborn people!)
I would try to introduce some software engineering ideas at undergraduate level and then reinforce them by practice at graduate level, also provide pointers to resources like the Software Carpentry course mentioned above.
I'd expect that a significant fraction of academic physicists will be writing software (not necessarily all though) and they are in dire need of at least an introduction to ideas in software engineering.
What languages/environments have you used for developing scientific software, esp. data analysis? What libraries? (E.g., what do you use for plotting?)
Python, NumPy and pylab (plotting).
Was there any training for people without any significant background in programming?
No, but I was working in a multimedia research lab, so almost everybody had a computer science background.
Did you have anything like version control, bug tracking?
Yes, Subversion for version control, Trac for bug tracing and wiki. You can get free bug tracker/version control hosting from http://www.assembla.com/ if their TOS fits your project.
How would you go about trying to create a decent environment for programming, without getting too much in the way of the individual scientists (esp. physicists are stubborn people!).
Make sure the infrastructure is set up and well maintained and try to sell the benefits of source control.
I'm a statistician at a university in the UK. Generally people here use R for data analysis, it's fairly easy to learn if you know C/Perl. Its real power is in the way you can import and modify data interactively. It's very easy to take a number of say CSV (or Excel) files and merge them, create new columns based on others and then throw that into a GLM, GAM or some other model. Plotting is trivial too and doesn't require knowledge of a whole new language (like PGPLOT or GNUPLOT.) Of course, you also have the advantage of having a bunch of built-in features (from simple things like mean, standard deviation etc all the way to neural networks, splines and GL plotting.)
Having said this, there are a couple of issues. With very large datasets R can become very slow (I've only really seen this with >50,000x30 datasets) and since it's interpreted you don't get the advantage of Fortran/C in this respect. But, you can (very easily) get R to call C and Fortran shared libraries (either from something like netlib or ones you've written yourself.) So, a usual workflow would be to:
Work out what to do.
Prototype the code in R.
Run some preliminary analyses.
Re-write the slow code into C or Fortran and call that from R.
Which works very well for me.
I'm one of the only people in my department (of >100 people) using version control (in my case using git with githuib.com.) This is rather worrying, but they just don't seem to be keen on trying it out and are content with passing zip files around (yuck.)
My suggestion would be to continue using LabView for the acquisition (and perhaps trying to get your co-workers to agree on a toolset for acquisition and making is available for all) and then move to exporting the data into a CSV (or similar) and doing the analysis in R. There's really very little point in re-inventing the wheel in this respect.
What languages/environments have you used for developing scientific software, esp. data analysis? What libraries? (E.g., what do you use for plotting?)
My undergraduate physics department taught LabVIEW classes and used it extensively in its research projects.
The other alternative is MATLAB, in which I have no experience. There are camps for either product; each has its own advantages/disadvantages. Depending on what kind of problems you need to solve, one package may be more preferable than the other.
Regarding data analysis, you can use whatever kind of number cruncher you want. Ideally, you can do the hard calculations in language X and format the output to plot nicely in Excel, Mathcad, Mathematica, or whatever the flavor du jour plotting system is. Don't expect standardization here.
Did you have anything like version control, bug tracking?
Looking back, we didn't, and it would have been easier for us all if we did. Nothing like breaking everything and struggling for hours to fix it!
Definitely use source control for any common code. Encourage individuals to write their code in a manner that could be made more generic. This is really just coding best practices. Really, you should have them teaching (or taking) a computer science class so they can get the basics.
How would you go about trying to create a decent environment for programming, without getting too much in the way of the individual scientists (esp. physicists are stubborn people!)
There is a clear split between data aquisition (DAQ) and data analysis. Meaning, it's possible to standardize on the DAQ and then allow the scientists to play with the data in the program of their choice.
Another good option is Scilab. It has graphic modules à la LabVIEW, it has its own programming language and you can also embed Fortran and C code, for example. It's being used in public and private sectors, including big industrial companies. And it's free.
About versioning, some prefer Mercurial, as it gives more liberties managing and defining the repositories. I have no experience with it, however.
For plotting I use Matplotlib. I will soon have to make animations, and I've seen good results using MEncoder. Here is an example including an audio track.
Finally, I suggest going modular, this is, trying to keep main pieces of code in different files, so code revision, understanding, maintenance and improvement will be easier. I have written, for example, a Python module for file integrity testing, another for image processing sequences, etc.
You should also consider developing with the use a debugger that allows you to check variable contents at settable breakpoints in the code, instead using print lines.
I have used Eclipse for Python and Fortran developing (although I got a false bug compiling a Fortran short program with it, but it may have been a bad configuration) and I'm starting to use the Eric IDE for Python. It allows you to debug, manage versioning with SVN, it has an embedded console, it can do refactoring with Bicycle Repair Man (it can use another one, too), you have Unittest, etc. A lighter alternative for Python is IDLE, included with Python since version 2.3.
As a few hints, I also suggest:
Not using single-character variables. When you want to search appearances, you will get results everywhere. Some argue that a decent IDE makes this easier, but then you will depend on having permanent access to the IDE. Even using ii, jj and kk can be enough, although this choice will depend on your language. (Double vowels would be less useful if code comments are made in Estonian, for instance).
Commenting the code from the very beginning.
For critical applications sometimes it's better to rely on older language/compiler versions (major releases), more stable and better debugged.
Of course you can have more optimized code in later versions, fixed bugs, etc, but I'm talking about using Fortran 95 instead of 2003, Python 2.5.4 instead of 3.0, or so. (Specially when a new version breaks backwards compatibility.) Lots of improvements usually introduce lots of bugs. Still, this will depend on specific application cases!
Note that this is a personal choice, many people could argue against this.
Use redundant and automated backup! (With versioning control).
Definitely, use Subversion to keep current, work-in-progress, and stable snapshot copies of source code. This includes C++, Java etc. for homegrown software tools, and quickie scripts for one-off processing.
With the strong leaning in science and applied engineering toward "lone cowboy" development methodology, the usual practice of organizing the repository into trunk, tag and whatever else it was - don't bother! Scientists and their lab technicians like to twirl knobs, wiggle electrodes and chase vacuum leaks. It's enough of a job to get everyone to agree to, say Python/NumPy or follow some naming convention; forget trying to make them follow arcane software developer practices and conventions.
For source code management, centralized systems such as Subversion are superior for scientific use due to the clear single point of truth (SPOT). Logging of changes and ability to recall versions of any file, without having chase down where to find something, has huge record-keeping advantages. Tools like Git and Monotone: oh my gosh the chaos I can imagine that would follow! Having clear-cut records of just what version of hack-job scripts were used while toying with the new sensor when that Higgs boson went by or that supernova blew up, will lead to happiness.
What languages/environments have you
used for developing scientific
software, esp. data analysis? What
libraries? (E.g., what do you use for
plotting?)
Languages I have used for numerics and sicentific-related stuff:
C (slow development, too much debugging, almost impossible to write reusable code)
C++ (and I learned to hate it -- development isn't as slow as C, but can be a pain. Templates and classes were cool initially, but after a while I realized that I was fighting them all the time and finding workarounds for language design problems
Common Lisp, which was OK, but not widely used fo Sci computing. Not easy to integrate with C (if compared to other languages), but works
Scheme. This one became my personal choice.
My editor is Emacs, although I do use vim for quick stuff like editing configuration files.
For plotting, I usually generate a text file and feed it into gnuplot.
For data analysis, I usually generate a text file and use GNU R.
I see lots of people here using FORTRAN (mostly 77, but some 90), lots of Java and some Python. I don't like those, so I don't use them.
Was there any training for people
without any significant background in
programming?
I think this doesn't apply to me, since I graduated in CS -- but where I work there is no formal training, but people (Engineers, Physicists, Mathematicians) do help each other.
Did you have anything like version
control, bug tracking?
Version control is absolutely important! I keep my code and data in three different machines, in two different sides of the world -- in Git repositories. I sync them all the time (so I have version control and backups!) I don't do bug control, although I may start doing that.
But my colleagues don't BTS or VCS at all.
How would you go about trying to
create a decent environment for
programming, without getting too much
in the way of the individual
scientists (esp. physicists are
stubborn people!)
First, I'd give them as much freedom as possible. (In the University where I work I could chooe between having someone install Ubuntu or Windows, or install my own OS -- I chose to install my own. I don't have support from them and I'm responsible for anything that happens with my machins, including security issues, but I do whatever I want with the machine).
Second, I'd see what they are used to, and make it work (need FORTRAN? We'll set it up. Need C++? No problem. Mathematica? OK, we'll buy a license). Then see how many of them would like to learn "additional tools" to help them be more productive (don't say "different" tools. Say "additional", so it won't seem like anyone will "lose" or "let go" or whatever). Start with editors, see if there are groups who would like to use VCS to sync their work (hey, you can stay home and send your code through SVN or GIT -- wouldn't that be great?) and so on.
Don't impose -- show examples of how cool these tools are. Make data analysis using R, and show them how easy it was. Show nice graphics, and explain how you've created them (but start with simple examples, so you can quickly explain them).
I would suggest F# as a potential candidate for performing science-related manipulations given its strong semantic ties to mathematical constructs.
Also, its support for units-of-measure, as written about here makes a lot of sense for ensuring proper translation between mathematical model and implementation source code.
First of all, I would definitely go with a scripting language to avoid having to explain a lot of extra things (for example manual memory management is - mostly - ok if you are writing low-level, performance sensitive stuff, but for somebody who just wants to use a computer as an upgraded scientific calculator it's definitely overkill). Also, look around if there is something specific for your domain (as is R for statistics). This has the advantage of already working with the concepts the users are familiar with and having specialized code for specific situations (for example calculating standard deviations, applying statistical tests, etc in the case of R).
If you wish to use a more generic scripting language, I would go with Python. Two things it has going for it are:
The interactive shell where you can experiment
Its clear (although sometimes lengthy) syntax
As an added advantage, it has libraries for most of the things you would want to do with it.
I'm no expert in this area, but I've always understood that this is what MATLAB was created for. There is a way to integrate MATLAB with SVN for source control as well.