embedded scripting language deep rationale - game-engine

My question refers to:
Scripting languages and Game Dev/Programming
What I'd like to ask about is the deep rationale for embedding scripting languages into games. If you check wikipedia:
http://en.wikipedia.org/wiki/Freescape
Then you can see that a 3D engine from the 80's used an embedded scripting language. A game running on the ZX Spectrum 48, say, had an embedded interpreter for a scripting language. That the embedding of scripting languages has remained popular so long, seems to imply, that there are deep-rooted reasons for embedding a scripting language into a game. What reasons to do so have remained from the 80's all the way to present times? Or have the reasons changed? The answers given in the referenced question cannot possibly all apply to the situation of the past eras of computing.

I don’t know very well all reasons in the past, but I can imagine that all reasons from present are applicable to a game development from 80’s hardware/tools. It’s easier and faster to reuse “components” and edit them and it would be easier accomplishment for different tasks.
Scripting allows make better game prototypes. Sometimes takes too long to make a compilation of your (prototype) game just for test some new feature or configuration. Scripts allow on-the-fly reprogramming and to (re)test what you need. It’s not much different now. Seems the reasons don’t changed a lot. The main reasons probably remain because games (in special "triple A" games) are much more complex than before.

All of the benefits of scripting now applied equally then. So you can immediately award points for:
separation of concerns — the level designer and the engine programmer almost needn't even be on talking terms as the specific game becomes a separate project from the main body of the code; and
faster prototyping and experimentation — Freescape's scripting language is compiled into byte code but the scripts tend to be very short versus the weight of the entire game, so trying out changes is faster.
In addition, the Freescape engine was a huge, expensive development for the time — they spent 14 months on it versus the month or two that most contemporary products received. Scripting all game logic achieved a significant secondary goal for them: portability.
The first Freescape game, Driller, uses exactly the same data and scripts across all of its releases. They had to build that out for the Spectrum, the CPC, the C64, the Amiga and the PC (a total of four CPU architectures, for a project that will have needed to be written in assembly language), but once that was done the game logic was write once, run anywhere — it became very easy to build subsequent games and release them simultaneously across the board.
Counting the 3D Construction Kit, they managed to publish five major games and two minor ones (Total Eclipse II and Castle Master II, both to juice up rereleases of the main title) over the following four years.
Portability isn't so much of a problem now as people use higher level languages and are generally isolated from the hardware, but prototyping and separation are just as important as ever.

Related

Example of modular game engine?

I found this a very interesting read: http://www.devmaster.net/articles/oo-game-design/
The author repeatedly says "Wow, this could be great, if implemented carefully. This is the future!". Well, not very useful. I need code, and most of all, I need a proof that this kind of design actually works.
Do you know of an example which implements some of the concepts mentioned in this article? Maybe a small open source game one could study? Or, at least, a place where similar concepts are discussed?
Through the wise use of inheritance and over-ridden methods, and thoughtful careful design of the implied base classes
Good design is good, of course, but virtual methods are certainly no panacea, and have a significant performance cost, especially on game consoles.
Reusable in such a way that two entities created oblivious to each other could, utilizing such a development system, work together with NO changes to their code
No. Any given entity in a real game will almost invariably have certain details that tie it to that game. It will depend on certain global render state (lighting conditions, shaders, shader parameters, etc.), and will be intimately tied to the core objects used by the physics system.
This system is currently in a prototype stage, yet it has the capacity to produce mid-range quality games in as little as three months.
A number pulled entirely from the author's nether orifice.
At the very least, such a system can be used to prototype games extremely rapidly, which has its own benefits.
This may be true, but even prototyping in games is challenging. It's impossible to evaluate a rough draft of a game if it's running at half speed. Performance always matters.
In short, he's got some OK ideas in there, but it sure as hell isn't the One True Way to make games. What he describes is a massively decoupled and fine-grained architecture. That sounds nice in principle but will almost invariably lead to poor performance and an unmaintainable soup of tiny classes.

When its enough for a programming language that you need to switch to another?

I have wonder that many big applications (e.g. social websites such as facebook) are build with many languages into its platform.
They usually start with AJAX browser support, then scale down to PHP scripting, then move towards a powrful OOP technologie such as Java or .NET, and finally a primitive language to increase performance in crucial operations such as C.
My question is how should I determinate the edge of the layers between languages. When PHP, when Java, when C and so on. And the other question is if should those languages integrate in a vertcal fashion for simplicity and maintanance, or could it be cases when you decide to program on module of your app in Java and the other in native C.
What are the context variables that push me to move to a better performance language? (e.g. concurrency issues due increase of users)
Don't tell me that PHP overlaps .NET and Java Technologies. In a starter point it does, but when the network is overload you start seeing the diferences. I mean how can I achieve Multithreading in PHP as in Java with the same performance. The thing it's hard to answer my wuestion is becasue there is not so much reading about this. You maybe find some good books covering PHP, but few telling how when and why integrate different languages.
Each language was created for different purposes, Python is strong with string operations, Perl very powerful in batch scripting, PHP a very reliable application web server, C the mother of most popular languages.
Best,
Demian.
On one end of the scale, you move to a higher performance language whenever your profiling and measurements tell you that you have a bottleneck that can't be fixed with better algorithms, data structures, or other optimisation.
At the other end, you move to a higher level language (ie. more abstraction, better libraries) whenever your management allow you to do so. ;)
I believe most teams simply use what they are best familiar with.
There are also questions of licensing that can influence the decision.
That is, if you're talking about technologies that compare to each other and solve the problem on the same level (for example ASP.NET/JSF/JSP/PHP...). But you can't compare .NET with C++ for example, they are meant to solve different problems on different abstraction levels.
My criterion for any programming language is "does it help me to get the job done or does it just get in the way?" If the latter, then it's time to move on.
From an economical point of view the answer is easy: on a regular basis just look what will be cheaper. Either continue with the current technology and maybe stretch the envelope a bit more. Or switch to something new. When you compare the two alternatives the cost of the investment already done is not important anymore since you've already spent that money/effort. You only have to look ahead: cost of licenses, education, etc.
Of course this is easier said then done, but just sitting down with a few people, thinking about it, and maybe try to come up with some numbers already helps a lot. I have seen too many projects that continued with technology that really wasn't suited for the job anymore.
Also hard numbers don't tell the whole story. There will be resistance because of unfamiliar technology, experts who are losing their status, etc.
Identify the bottleneck
Solve bottleneck
Go to 1
I'm sure you can imagine that step 2 is the one where decisions like "What programming language do we use" and "where do we put the coffee machine" come into play. That's the basic rule.

Significant Challengers to OOP

From what I understand, OOP is the most commonly used paradigm for large scale projects. I also know that some smaller subsets of big systems use other paradigms (e.g. SQL, which is declarative), and I also realize that at lower levels of computing OOP isn't really feasible. But it seems to me that usually the pieces of higher level solutions are almost always put together in a OOP fashion.
Are there any scenarios where a truly non-OOP paradigm is actually a better choice for a largescale solution? Or is that unheard of these days?
I've wondered this ever since I've started studying CS; it's easy to get the feeling that OOP is some nirvana of programming that will never be surpassed.
In my opinion, the reason OOP is used so widely isn't so much that it's the right tool for the job. I think it's more that a solution can be described to the customer in a way that they understand.
A CAR is a VEHICLE that has an ENGINE. That's programming and real world all in one!
It's hard to comprehend anything that can fit the programming and real world quite so elegantly.
Linux is a large-scale project that's very much not OOP. And it wouldn't have a lot to gain from it either.
I think OOP has a good ring to it, because it has associated itself with good programming practices like encapsulation, data hiding, code reuse, modularity et.c. But these virtues are by no means unique to OOP.
You might have a look at Erlang, written by Joe Armstrong.
Wikipedia:
"Erlang is a general-purpose
concurrent programming language and
runtime system. The sequential subset
of Erlang is a functional language,
with strict evaluation, single
assignment, and dynamic typing."
Joe Armstrong:
“Because the problem with
object-oriented languages is they’ve
got all this implicit environment that
they carry around with them. You
wanted a banana but what you got was a
gorilla holding the banana and the
entire jungle.”
The promise of OOP was code reuse and easier maintenance. I am not sure it delivered. I see things such as dot net as being much the same as the C libraries we used to get fro various vendors. You can call that code reuse if you want. As for maintenance bad code is bad code. OOP did not help.
I'm the biggest fan of OOP, and I practice OOP every day.
It's the most natural way to write code, because it resembles the real life.
Though, I realize that the OOP's virtualization might cause performance issues.
Of course that depends on your design, the language and the platform you chose (systems written in Garbage collection based languages such as Java or C# might perform worse than systems which were written in C++ for example).
I guess in Real-time systems, procedural programming may be more appropriate.
Note that not all projects that claim to be OOP are in fact OOP. Sometimes the majority of the code is procedural, or the data model is anemic, and so on...
Zyx, you wrote, "Most of the systems use relational databases ..."
I'm afraid there's no such thing. The relational model will be 40 years old next year and has still never been implemented. I think you mean, "SQL databases." You should read anything by Fabian Pascal to understand the difference between a relational dbms and an SQL dbms.
" ... the relational model is usually chosen due to its popularity,"
True, it's popular.
" ... availability of tools,"
Alas without the main tool necessary: an implementation of the relational model.
" support,"
Yup, the relational model has fine support, I'm sure, but it's entirely unsupported by a dbms implementation.
" and the fact that the relational model is in fact a mathematical concept,"
Yes, it's a mathematical concept, but, not being implemented, it's largely restricted to the ivory towers. String theory is also a mathematical concept but I wouldn't implement a system with it.
In fact, despite it's being a methematical concept, it is certainly not a science (as in computer science) because it lacks the first requirement of any science: that it is falsifiable: there's no implementation of a relational dbms against which we can check its claims.
It's pure snake oil.
" ... contrary to OOP."
And contrary to OOP, the relational model has never been implemented.
Buy a book on SQL and get productive.
Leave the relational model to unproductive theorists.
See this and this. Apparently you can use C# with five different programming paradigms, C++ with three, etc.
Software construction is not akin to Fundamental Physics. Physics strive to describe reality using paradigms which may be challenged by new experimental data and/or theories. Physics is a science which searches for a "truth", in a way that Software construction doesn't.
Software construction is a business. You need to be productive, i.e. to achieve some goals for which someone will pay money. Paradigms are used because they are useful to produce software effectively. You don't need everyone to agree. If I do OOP and it's working well for me, I don't care if a "new" paradigm would potentially be 20% more useful to me if I had the time and money to learn it and later rethink the whole software structure I'm working in and redesign it from scratch.
Also, you may be using another paradigm and I'll still be happy, in the same way that I can make money running a Japanese food restaurant and you can make money with a Mexican food restaurant next door. I don't need to discuss with you whether Japanese food is better than Mexican food.
I doubt OOP is going away any time soon, it just fits our problems and mental models far too well.
What we're starting to see though is multi-paradigm approaches, with declarative and functional ideas being incorporated into object oriented designs. Most of the newer JVM languages are a good example of this (JavaFX, Scala, Clojure, etc.) as well as LINQ and F# on the .net platform.
It's important to note that I'm not talking about replacing OO here, but about complementing it.
JavaFX has shown that a declarative
solution goes beyond SQL and XSLT,
and can also be used for binding
properties and events between visual
components in a GUI
For fault tolerant and highly
concurrent systems, functional
programming is a very good fit,
as demonstrated by the Ericsson
AXD301 (programmed using Erlang)
So... as concurrency becomes more important and FP becomes more popular, I imagine that languages not supporting this paradigm will suffer. This includes many that are currently popular such as C++, Java and Ruby, though JavaScript should cope very nicely.
Using OOP makes the code easier to manage (as in modify/update/add new features) and understand. This is especially true with bigger projects. Because modules/objects encapsulate their data and operations on that data it is easier to comprehend the functionality and the big picture.
The benefit of OOP is that it is easier to discuss (with other developers/management/customer) a LogManager or OrderManager, each of which encompass specific functionality, then describing 'a group of methods that dump the data in file' and 'the methods that keep track of order details'.
So I guess OOP is helpful especially with big projects but there are always new concepts turning up so keep on lookout for new stuff in the future, evaluate and keep what is useful.
People like to think of various things as "objects" and classify them, so no doubt that OOP is so popular. However, there are some areas where OOP has not gained a bigger popularity. Most of the systems use relational databases rather than objective. Even if the second ones hold some notable records and are better for some types of tasks, the relational model is unsually chosen due to its popularity, availability of tools, support and the fact that the relational model is in fact a mathematical concept, contrary to OOP.
Another area where I have never seen OOP is the software building process. All the configuration and make scripts are procedural, partially because of the lack of the support for OOP in shell languages, partially because OOP is too complex for such tasks.
Slightly controversial opinion from me but I don't find OOP, at least of a kind that is popularly applied now, to be that helpful in producing the largest scale software in my particular domain (VFX, which is somewhat similar in scene organization and application state as games). I find it very useful on a medium to smaller scale. I have to be a bit careful here since I've invited some mobs in the past, but I should qualify that this is in my narrow experience in my particular type of domain.
The difficulty I've often found is that if you have all these small concrete objects encapsulating data, they now want to all talk to each other. The interactions between them can get extremely complex, like so (except much, much more complex in a real application spanning thousands of objects):
And this is not a dependency graph directly related to coupling so much as an "interaction graph". There could be abstractions to decouple these concrete objects from each other. Foo might not talk to Bar directly. It might instead talk to it through IBar or something of this sort. This graph would still connect Foo to Bar since, albeit being decoupled, they still talk to each other.
And all this communication between small and medium-sized objects which make up their own little ecosystem, if applied to the entire scale of a large codebase in my domain, can become extremely difficult to maintain. And it becomes so difficult to maintain because it's hard to reason about what happens with all these interactions between objects with respect to things like side effects.
Instead what I've found useful is to organize the overall codebase into completely independent, hefty subsystems that access a central "database". Each subsystem then inputs and outputs data. Some other subsystems might access the same data, but without any one system directly talking to each other.
... or this:
... and each individual system no longer attempts to encapsulate state. It doesn't try to become its own ecosystem. It instead reads and writes data in the central database.
Of course in the implementation of each subsystem, they might use a number of objects to help implement them. And that's where I find OOP very useful is in the implementation of these subsystems. But each of these subsystems constitutes a relatively medium to small-scale project, not too large, and it's at that medium to smaller scale that I find OOP very useful.
"Assembly-Line Programming" With Minimum Knowledge
This allows each subsystem to just focus on doing its thing with almost no knowledge of what's going on in the outside world. A developer focusing on physics can just sit down with the physics subsystem and know little about how the software works except that there's a central database from which he can retrieve things like motion components (just data) and transform them by applying physics to that data. And that makes his job very simple and makes it so he can do what he does best with the minimum knowledge of how everything else works. Input central data and output central data: that's all each subsystem has to do correctly for everything else to work. It's the closest thing I've found in my field to "assembly line programming" where each developer can do his thing with minimum knowledge about how the overall system works.
Testing is still also quite simple because of the narrow focus of each subsystem. We're no longer mocking concrete objects with dependency injection so much as generating a minimum amount of data relevant to a particular system and testing whether the particular system provides the correct output for a given input. With so few systems to test (just dozens can make up a complex software), it also reduces the number of tests required substantially.
Breaking Encapsulation
The system then turns into a rather flat pipeline transforming central application state through independent subsystems that are practically oblivious to each other's existence. One might sometimes push a central event to the database which another system processes, but that other system is still oblivious about where that event came from. I've found this is the key to tackling complexity at least in my domain, and it is effectively through an entity-component system.
Yet it resembles something closer to procedural or functional programming at the broad scale to decouple all these subsystems and let them work with minimal knowledge of the outside world since we're breaking encapsulation in order to achieve this and avoid requiring the systems to talk to each other. When you zoom in, then you might find your share of objects being used to implement any one of these subsystems, but at the broadest scale, the systems resembles something other than OOP.
Global Data
I have to admit that I was very hesitant about applying ECS at first to an architectural design in my domain since, first, it hadn't been done before to my knowledge in popular commercial competitors (3DS Max, SoftImage, etc), and second, it looks like a whole bunch of globally-accessible data.
I've found, however, that this is not a big problem. We can still very effectively maintain invariants, perhaps even better than before. The reason is due to the way the ECS organizes everything into systems and components. You can rest assured that an audio system won't try to mutate a motion component, e.g., not even under the hackiest of situations. Even with a poorly-coordinated team, it's very improbable that the ECS will degrade into something where you can no longer reason about which systems access which component, since it's rather obvious on paper and there are virtually no reasons whatsoever for a certain system to access an inappropriate component.
To the contrary it often removed many of the former temptations for hacky things with the data wide open since a lot of the hacky things done in our former codebase under loose coordination and crunch time was done in hasty attempts to x-ray abstractions and try to access the internals of the ecosystems of objects. The abstractions started to become leaky as a result of people, in a hurry, trying to just get and do things with the data they wanted to access. They were basically jumping through hoops trying to just access data which lead to interface designs degrading quickly.
There is something vaguely resembling encapsulation still just due to the way the system is organized since there's often only one system modifying a particular type of components (two in some exceptional cases). But they don't own that data, they don't provide functions to retrieve that data. The systems don't talk to each other. They all operate through the central ECS database (which is the only dependency that has to be injected into all these systems).
Flexibility and Extensibility
This is already widely-discussed in external resources about entity-component systems but they are extremely flexible at adapting to radically new design ideas
in hindsight, even concept-breaking ones like a suggestion for a creature which is a mammal, insect, and plant that sprouts leaves under sunlight all at once.
One of the reasons is because there are no central abstractions to break. You introduce some new components if you need more data for this or just create an entity which strings together the components required for a plant, mammal, and insect. The systems designed to process insect, mammal, and plant components then automatically pick it up and you might get the behavior you want without changing anything besides adding a line of code to instantiate an entity with a new combo of components. When you need whole new functionality, you just add a new system or modify an existing one.
What I haven't found discussed so much elsewhere is how much this eases maintenance even in scenarios when there are no concept-breaking design changes that we failed to anticipate. Even ignoring the flexibility of the ECS, it can really simplify things when your codebase reaches a certain scale.
Turning Objects Into Data
In a previous OOP-heavy codebase where I saw the difficulty of maintaining a codebase closer to the first graph above, the amount of code required exploded because the analogical Car in this diagram:
... had to be built as a completely separate subtype (class) implementing multiple interfaces. So we had an explosive number of objects in the system: a separate object for point lights from directional lights, a separate object for a fish eye camera from another, etc. We had thousands of objects implementing a few dozen abstract interfaces in endless combinations.
When I compared it to ECS, that required only hundreds and we were able to do the exact same things before using a small fraction of the code, because that turned the analogical Car entity into something that no longer requires its class. It turns into a simple collection of component data as a generalized instance of just one Entity type.
OOP Alternatives
So there are cases like this where OOP applied in excess at the broadest level of the design can start to really degrade maintainability. At the broadest birds-eye view of your system, it can help to flatten it and not try to model it so "deep" with objects interacting with objects interacting with objects, however abstractly.
Comparing the two systems I worked on in the past and now, the new one has more features but takes hundreds of thousands of LOC. The former required over 20 million LOC. Of course it's not the fairest comparison since the former one had a huge legacy, but if you take a slice of the two systems which are functionally quite equal without the legacy baggage (at least about as close to equal as we might get), the ECS takes a small fraction of the code to do the same thing, and partly because it dramatically reduces the number of classes there are in the system by turning them into collections (entities) of raw data (components) with hefty systems to process them instead of a boatload of small/medium objects.
Are there any scenarios where a truly non-OOP paradigm is actually a
better choice for a largescale solution? Or is that unheard of these
days?
It's far from unheard of. The system I'm describing above, for example, is widely used in games. It's quite rare in my field (most of the architectures in my field are COM-like with pure interfaces, and that's the type of architecture I worked on in the past), but I've found that peering over at what gamers are doing when designing an architecture made a world of difference in being able to create something that still remains very comprehensible at it grows and grows.
That said, some people consider ECS to be a type of object-oriented programming on its own. If so, it doesn't resemble OOP of a kind most of us would think of, since data (components and entities to compose them) and functionality (systems) are separated. It requires abandoning encapsulation at the broad system level which is often considered one of the most fundamental aspects of OOP.
High-Level Coding
But it seems to me that usually the pieces of higher level solutions
are almost always put together in a OOP fashion.
If you can piece together an application with very high-level code, then it tends to be rather small or medium in scale as far as the code your team has to maintain and can probably be assembled very effectively using OOP.
In my field in VFX, we often have to do things that are relatively low-level like raytracing, image processing, mesh processing, fluid dynamics, etc, and can't just piece these together from third party products since we're actually competing more in terms of what we can do at the low-level (users get more excited about cutting-edge, competitive production rendering improvements than, say, a nicer GUI). So there can be lots and lots of code ranging from very low-level shuffling of bits and bytes to very high-level code that scripters write through embedded scripting languages.
Interweb of Communication
But there comes a point with a large enough scale with any type of application, high-level or low-level or a combo, that revolves around a very complex central application state where I've found it no longer useful to try to encapsulate everything into objects. Doing so tends to multiply complexity and the difficulty to reason about what goes on due to the multiplied amount of interaction that goes on between everything. It no longer becomes so easy to reason about thousands of ecosystems talking to each other if there isn't a breaking point at a large enough scale where we stop modeling each thing as encapsulated ecosystems that have to talk to each other. Even if each one is individually simple, everything taken in as a whole can start to more than overwhelm the mind, and we often have to take a whole lot of that in to make changes and add new features and debug things and so forth if you try to revolve the design of an entire large-scale system solely around OOP principles. It can help to break free of encapsulation at some scale for at least some domains.
At that point it's not necessarily so useful anymore to, say, have a physics system encapsulate its own data (otherwise many things could want to talk to it and retrieve that data as well as initialize it with the appropriate input data), and that's where I found this alternative through ECS so helpful, since it turns the analogical physics system, and all such hefty systems, into a "central database transformer" or a "central database reader which outputs something new" which can now be oblivious about each other. Each system then starts to resemble more like a process in a flat pipeline than an object which forms a node in a very complex graph of communication.

Are embedded developers more conservative than their desktop brethrens? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
I've been in the embedded space for a while now, and it seems that most programmers I talk to seem to be doing things pretty much the same way it was done 15 years or more ago: Waterfall(ish) Development, command line tools and a small group uses lint.
Contrast this with the server/desktop environment, where there seems to be lots of activity related to all sorts of facets of programming:
XP, Scrum, Iterative, Lean/Agile
Continuous Integration
Automated Builds
Automated Unit Testing Frameworks
Refactoring tool support
Is it just that the embedded environment makes it more difficult to implement new practices or tools?
Is it that the mindset of embedded programmers steers them away from new tools/concepts?
Is it that management in the typical embedded industry behind the curve compared to IT focused fields?
I do realize that this is a generalization, and some embedded projects do use Scrum, Agile, CI, Automated Builds (in fact I worked at a company that had that in place since the 80s). But my impression is that it is a very small percentage.
We are all used to the fact that our desktop PC crashes once in a while (or at least an application on the desktop suddenly disappears). It's no big deal. The next patch will fix it.
In the embedded space, you are building something which can't be patched. Lives can depend on your device (in a car, an elevator or a medical system). Most devices are installed and then must run unattended for years. So embedded people tend to be very conservative. TCP/IP is often "too modern". They stick to their trusty serial bus with a communication "stack" that is roughly 50 lines of assembler code.
What's worse, you simply don't have the abundance of space on the device which means you can't use one of the latest programming languages which make TDD and automated builds a bliss.
Next, a lot of embedded development environments are proprietary. If your supplier doesn't support it, you won't get it. Linux has started to weaken this in the past years but a whole lot of devices are not powerful enough to run Linux, yet. And even if they were, the CPU power would be used for something else instead of running a fancy OS which comes with source.
So yes, there are powerful forces working in the background to keep the embedded space where it is.
Are embedded developers more conservative than their desktop brethrens?
Yes, because they are more concerned with the consequences of making errors. It’s a big deal to patch an embedded device. Not so much for a desktop app.
Waterfallish development is necessary in the embedded world because you are generally building hardware at the same time as the software. You need to know as soon as possible how much memory, how much processor speed, how big a flash, what if any special hardware is necessary etc...The hardware design can’t complete until you know these answers. Once you decide, that is pretty much it. The lead time for redoing a board is far too long. If you mess up then the software is going to have to work around any short-comings. Not usually an ideal situation.
As for the tools, that is largely based on what the supplier provides and any biases of the developers. On some projects I have used XP Embedded and got pretty much everything that the desktop developer gets.
XP, Scrum, Iterative, Lean/Agile:
Since most of the design is done up front (by necessity), and you usually don’t have working hardware when it is time to code, the quick turn-around processes don’t really provide much benefit.
Continuous Integration/Automated Builds
Nice to have, but not really necessary. What…it takes about 15 seconds to open the IDE and press the compile button.
Automated Unit Testing
No reason why this shouldn't be done, but only part of the code can truly be automatically tested because the other part is either hardware dependent or has some other dependencies like timing. So you can't really tell if the code is working by the automated tests.
Refactoring Tool Support
The vendors of embedded processors product is the processor. They provide the IDE support in order to encourage you to purchase their processor. They couldn't possibly afford to pay for a Visual Studio sized development team in order to add all the bells and whistles to the IDE which isn't even their product.
These some reasons I can think of:
Embedded teams are usually smaller that desktop/Web teams. Code base is smaller.
System testing is much more important than unit testing. The software needs to be tested together with hardware. Automated testing is not possible and can only be applied to a small fraction of the code base.
Embedded engineers have a different skill set than software engineers. They interact with hardware, know how to use an oscilloscope and a logic analyzer. Usually, the difficult part of their job is to find a glitch in the hardware. They do not have the time to adopt modern software methologies.
Embedded programmers are mostly electrical engineers, not computer scientists or software engineers.
They excel in their field of expertise. They bring a slower more methodical approach than most computer programmers. When it comes to programming firmware, electrical engineers know just enough to be dangerous.
Here are some of the things I've noticed electrical engineers doing in C:
All code in ONE single file
Math like variable names: x, y, z
No or missing indentation
No stardard comment headers
No comments at all
Too many comments
In their defense EE's didn't train to be computer programmers, it's not their job. I think software is the hardest part of creating embedded devices. Designing PCBs and choosing components requires skill but pales in comparison to the complexity of 10,000 lines of code.
Embedded programmers also have to deal IDE's that look and behave like the IDE's of the 90's.
MPLAB
AVR Studio
Is it just that the embedded environment makes it more difficult to implement new practices or tools?
It's partly a matter of scale. Software is NOT the product, the product is the product. however, there are thousands of different types of microcontrollers and microprocessors out there, and the most popular thousand have 3-4 different compilers that aren't completely compatible.
So a given tool is only going to be used by a few hundred or thousand engineers.
In windows development, however, there are millions of programmers of many levels - the tools produce software directly which is the product, and so it's going to get more eyeballs, and more money.
Each new product that an engineer puts out might have a different processor.
Is it that the mindset of embedded programmers steers them away from new tools/concepts?
Embedded programmers are generally software or firmware engineers, as opposed to programmers. Engineering implies a certain amount of design, design analysis, and design proof prior to implementation - in other words a ton of work is done before the first line of code is written, and the documentation, ideally, is specific enough that implementation is merely turning pseudocode like documentation into compilable code.
New tools and concepts are needed in the design phase, not the implementation phase. An IDE with intellisense may be nice, but by the time the code is being written it's useless cruft - they already know what they need.
CAD - computer aided design - tools are being developed for firmware engineers that are used in the design phase to develop models and simulations that are directly turned into code. Matlab and simulink are good examples of this. The system as a whole is designed.
In fact, one might wonder why software developers are still writing code while the engineers are making data/program flow charts and state machine diagrams. Why is UML uptake so slow in the application world? It sounds like application developers can use some of the tools in common use among embedded systems engineers...
Is it that management in the typical embedded industry behind the curve compared to IT focused fields?
Actually, it's likely the reverse. When a project starts the engineers have to pick the processor.
The processor manufacturers get less money on older chips, so they pitch the latest and greatest, and they are generally cheaper overall than the chips used in the previous design (either by die shrinks, more integration, etc).
So the design is actually using the latest and greatest chips.
The downside is that the compiler and tools are often immature. They can only build so much on the older tools, and since the target moves with each new processor, they can't focus on a lot of the nice features application programmers might like. Especially since many of those features won't be useful to an embedded engineer.
There are many other factors, some of which are enumerated by other answers, but it's really a different field even though they both involve programming.
-Adam
I would also add a couple of points here:
In general embedded projects tend to be smaller than desktop projects. This decreases the need for very elaborated software processes.
Requirements for embedded project are often precise and better defined. Therefore SCRUM and agile are not so crucial
Finally embedded projects are generally a mix of software and hardware. The software being only a part of the project embedded developpers invest less time in software processes
I agree with much that's been written here:
Old tools without the bells and whistles (far fewer refactorings available due to C/C++'s preprocessor directives, if any at all) (time consuming to choose a unit test framework vs simply using JUnit).
It's true that waterfall feels more efficient. If I'm going to open the hood and get into a hard-to-access place, I'll want to do as much as I can while I'm there, rather than exiting and closing the hood after each task just to open it again. The idea that creating the most important features first allows you the option of shipping when promised instead of going late can also be hard to grasp when you believe nothing is optional, which might be true. IME, though, when the deadline looms something always becomes unnecessary.
Less visibility into the system makes it riskier to revisit existing code to refactor or change functionality. There are often timing issues, which automated tests running on the host using stubs and mocks won't catch. It can be hard for someone who's been bitten by these issues to take a different perspective.
I'll add one more; the language of agile/scrum is in workstation programmer's terms. To an embedded developer who knows just enough C to get the job done, what is a class, object, or method? When the "user" is typically regarded as a physical person clicking and typing, and the product has no person user interface, it's easy to dismiss the idea as Not Applicable. This may change with James Grenning's forthcoming book about TDD in C. I've been reading the beta ebook and it's quite good.
I would say it's more lack of good toolsets. It's really frustrating when you want to use C++ for its compile-time features not present in C (templates, namespaces, object-orientedness, etc) rather than its run-time features (exceptions, virtual functions) -- but the device manufacturers & 3rd parties just give you a C compiler, not C++. This probably results more from market size (hundreds of millions of PCs running Windows, with hundreds of thousands or even millions of developers -- vs. hundreds of thousands of Chip X, with hundreds or low thousands of developers) than from device capability.
edit: w/r/t robustness: there are different markets out there. The car/elevator/aeronautics/medical device market is going to have to be rigorous about getting rid of bugs. Other markets (toys, MP3 players, & other consumer electronics) can be less rigorous, especially if it's possible to upgrade code in the field. ("Oops! We're sorry we deleted your music library! We just fixed that bug, you can grab the latest release at our website at your convenience!")
I'd say different sorts of problem environments.
The biggest problem with the waterfall methodology is that requirements change. In every environment I've been in, there has been at least the likelihood of a requirements change, which means that the successful methodologies are those that keep flexibility as long as possible. Even if the customer has signed off in blood, and stands to forfeit his left hand if he suggests a change, there are changes coming in the future.
In embedded programming, it is possible to nail the requirements down up front. They come from the behavior of the system as a whole, and engineers are good at nailing down system requirements. Nobody's going to come in halfway through and say that the user now wants the pacemaker to deliver syncopated impulses while the recipient is dancing.
Once the requirements are frozen beyond thawing, which never happens in software designed for human use, waterfall is a very efficient methodology. The team proceeds from well-specified requirements to overall design, then detailed design, then coding, verifying all the way that the stages are done correctly. Then it's time to debug the code (since it's never perfect when written), and final tests to make sure the code meets the requirements.
I would also posit that some fields are inherently conservative. The transportation industry for example, where trains and planes may have life spans of 30 years or so. Customers tend to require tried and true practices, probably derived from IEEE. Waterfall is what customers know, waterfall is what customers demand.

Practices for programming in a scientific environment? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
Background
Last year, I did an internship in a physics research group at a university. In this group, we mostly used LabVIEW to write programs for controlling our setups, doing data acquisition and analyzing our data. For the first two purposes, that works quite OK, but for data analysis, it's a real pain. On top of that, everyone was mostly self-taught, so code that was written was generally quite a mess (no wonder that every PhD quickly decided to rewrite everything from scratch). Version control was unknown, and impossible to set up because of strict software and network regulations from the IT department.
Now, things actually worked out surprisingly OK, but how do people in the natural sciences do their software development?
Questions
Some concrete questions:
What languages/environments have you used for developing scientific software, especially data analysis? What libraries? (for example, what do you use for plotting?)
Was there any training for people without any significant background in programming?
Did you have anything like version control, and bug tracking?
How would you go about trying to create a decent environment for programming, without getting too much in the way of the individual scientists (especially physicists are stubborn people!)
Summary of answers thus far
The answers (or my interpretation of them) thus far: (2008-10-11)
Languages/packages that seem to be the most widely used:
LabVIEW
Python
with SciPy, NumPy, PyLab, etc. (See also Brandon's reply for downloads and links)
C/C++
MATLAB
Version control is used by nearly all respondents; bug tracking and other processes are much less common.
The Software Carpentry course is a good way to teach programming and development techniques to scientists.
How to improve things?
Don't force people to follow strict protocols.
Set up an environment yourself, and show the benefits to others. Help them to start working with version control, bug tracking, etc. themselves.
Reviewing other people's code can help, but be aware that not everyone may appreciate that.
What languages/environments have you used for developing scientific software, esp. data analysis? What libraries? (E.g., what do you use for plotting?)
I used to work for Enthought, the primary corporate sponsor of SciPy. We collaborated with scientists from the companies that contracted Enthought for custom software development. Python/SciPy seemed to be a comfortable environment for scientists. It's much less intimidating to get started with than say C++ or Java if you're a scientist without a software background.
The Enthought Python Distribution comes with all the scientific computing libraries including analysis, plotting, 3D visualation, etc.
Was there any training for people without any significant background in programming?
Enthought does offer SciPy training and the SciPy community is pretty good about answering questions on the mailing lists.
Did you have anything like version control, bug tracking?
Yes, and yes (Subversion and Trac). Since we were working collaboratively with the scientists (and typically remotely from them), version control and bug tracking were essential. It took some coaching to get some scientists to internalize the benefits of version control.
How would you go about trying to create a decent environment for programming, without getting too much in the way of the individual scientists (esp. physicists are stubborn people!)
Make sure they are familiarized with the tool chain. It takes an investment up front, but it will make them feel less inclined to reject it in favor of something more familiar (Excel). When the tools fail them (and they will), make sure they have a place to go for help — mailing lists, user groups, other scientists and software developers in the organization. The more help there is to get them back to doing physics the better.
The course Software Carpentry is aimed specifically at people doing scientific computing and aims to teach the basics and lessons of software engineering, and how best to apply them to projects.
It covers topics like version control, debugging, testing, scripting and various other issues.
I've listened to about 8 or 9 of the lectures and think it is to be highly recommended.
Edit: The MP3s of the lectures are available as well.
Nuclear/particle physics here.
Major programing work used to be done mostly in Fortran using CERNLIB (PAW, MINUIT, ...) and GEANT3, recently it has mostly been done in C++ with ROOT and Geant4. There are a number of other libraries and tools in specialized use, and LabVIEW sees some use here and there.
Data acquisition in my end of this business has often meant fairly low level work. Often in C, sometimes even in assembly, but this is dying out as the hardware gets more capable. On the other hand, many of the boards are now built with FPGAs which need gate twiddling...
One-offs, graphical interfaces, etc. use almost anything (Tcl/Tk used to be big, and I've been seeing more Perl/Tk and Python/Tk lately) including a number of packages that exist mostly inside the particle physics community.
Many people writing code have little or no formal training, and process is transmitted very unevenly by oral tradition, but most of the software group leaders take process seriously and read as much as necessary to make up their deficiencies in this area.
Version control for the main tools is ubiquitous. But many individual programmers neglect it for their smaller tasks. Formal bug tracking tools are less common, as are nightly builds, unit testing, and regression tests.
To improve things:
Get on the good side of the local software leaders
Implement the process you want to use in your own area, and encourage those you let in to use it too.
Wait. Physicists are empirical people. If it helps, they will (eventually!) notice.
One more suggestion for improving things.
Put a little time in to helping anyone you work directly with. Review their code. Tell them about algorithmic complexity/code generation/DRY or whatever basic thing they never learned because some professor threw a Fortran book at them once and said "make it work". Indoctrinate them on process issues. They are smart people, and they will learn if you give them a chance.
This might be slightly tangential, but hopefully relevant.
I used to work for National Instruments, R&D, where I wrote software for NI RF & Communication toolkits. We used LabVIEW quite a bit, and here are the practices we followed:
Source control. NI uses Perforce. We did the regular thing - dev/trunk branches, continuous integration, the works.
We wrote automated test suites.
We had a few people who came in with a background in signal processing and communication. We used to have regular code reviews, and best practices documents to make sure their code was up to the mark.
Despite the code reviews, there were a few occasions when "software guys", like me had to rewrite some of this code for efficiency.
I know exactly what you mean about stubborn people! We had folks who used to think that pointing out a potential performance improvement in their code was a direct personal insult! It goes without saying that that this calls for good management. I thought the best way to deal with these folks is to go slowly, not press to hard for changes and if necessary be prepared to do the dirty work. [Example: write a test suite for their code].
I'm not exactly a 'natural' scientist (I study transportation) but am an academic who writes a lot of my own software for data analysis. I try to write as much as I can in Python, but sometimes I'm forced to use other languages when I'm working on extending or customizing an existing software tool. There is very little programming training in my field. Most folks are either self-taught, or learned their programming skills from classes taken previously or outside the discipline.
I'm a big fan of version control. I used Vault running on my home server for all the code for my dissertation. Right now I'm trying to get the department to set up a Subversion server, but my guess is I will be the only one who uses it, at least at first. I've played around a bit with FogBugs, but unlike version control, I don't think that's nearly as useful for a one-man team.
As for encouraging others to use version control and the like, that's really the problem I'm facing now. I'm planning on forcing my grad students to use it on research projects they're doing for me, and encouraging them to use it for their own research. If I teach a class involving programming, I'll probably force the students to use version control there too (grading them on what's in the repository). As far as my colleagues and their grad students go, all I can really do is make a server available and rely on gentle persuasion and setting a good example. Frankly, at this point I think it's more important to get them doing regular backups than get them on source control (some folks are carrying around the only copy of their research data on USB flash drives).
1.) Scripting languages are popular these days for most things due to better hardware. Perl/Python/Lisp are prevalent for lightweight applications (automation, light computation); I see a lot of Perl at my work (computational EM) since we like Unix/Linux. For performance stuff, C/C++/Fortran are typically used. For parallel computing, well, we usually manually parallelize runs in EM as opposed to having a program implicitly do it (ie split up the jobs by look angle when computing radar cross sections).
2.) We just kind of throw people into the mix here. A lot of the code we have is very messy, but scientists are typically a scatterbrained bunch that don't mind that sort of thing. Not ideal, but we have things to deliver and we're severely understaffed. We're slowly getting better.
3.) We use SVN; however, we do not have bug tracking software. About as good as it gets for us is a txt file that tells you where bugs specific bugs are.
4.) My suggestion for implementing best practices for scientists: do it slowly. As scientists, we typically don't ship products. No one in science makes a name for himself by having clean, maintainable code. They get recognition from the results of that code, typically. They need to see justification for spending time on learning software practices. Slowly introduce new concepts and try to get them to follow; they're scientists, so after their own empirical evidence confirms the usefulness of things like version control, they will begin to use it all the time!
I'd highly recommend reading "What Every Computer Scientist Should Know About Floating-Point Arithmetic". A lot of problems I encounter on a regular basis come from issues with floating point programming.
I am a physicist working in the field of condensed matter physics, building classical and quantum models.
Languages:
C++ -- very versatile: can be used for anything, good speed, but it can be a bit inconvenient when it comes to MPI
Octave -- good for some supplementary calculations, very convenient and productive
Libraries:
Armadillo/Blitz++ -- fast array/matrix/cube abstractions for C++
Eigen/Armadillo -- linear algebra
GSL -- to use with C
LAPACK/BLAS/ATLAS -- extremely big and fast, but less convenient (and written in FORTRAN)
Graphics:
GNUPlot -- it has very clean and neat output, but not that productive sometimes
Origin -- very convenient for plotting
Development tools:
Vim + plugins -- it works great for me
GDB -- a great debugging tool when working with C/C++
Code::Blocks -- I used it for some time and found it quite comfortable, but Vim is still better in my opinion.
I work as a physicist in a UK university.
Perhaps I should emphasise that different areas of research have different emphasis on programming. Particle physicists (like dmckee) do computational modelling almost exclusively and may collaborate on large software projects, whereas people in fields like my own (condensed matter) write code relatively infrequently. I suspect most scientists fall into the latter camp. I would say coding skills are usually seen as useful in physics, but not essential, much like physics/maths skills are seen as useful for programmers but not essential. With this in mind...
What languages/environments have you used for developing scientific software, esp. data analysis? What libraries? (E.g., what do you use for plotting?)
Commonly data analysis and plotting is done using generic data analysis packages such as IGOR Pro, ORIGIN, Kaleidegraph which can be thought of as 'Excel plus'. These packages typically have a scripting language that can be used to automate. More specialist analysis may have a dedicated utility for the job that generally will have been written a long time ago, no-one has the source for and is pretty buggy. Some more techie types might use the languages that have been mentioned (Python, R, MatLab with Gnuplot for plotting).
Control software is commonly done in LabVIEW, although we actually use Delphi which is somewhat unusual.
Was there any training for people without any significant background in programming?
I've been to seminars on grid computing, 3D visualisation, learning Boost etc. given by both universities I've been at. As an undergraduate we were taught VBA for Excel and MatLab but C/MatLab/LabVIEW is more common.
Did you have anything like version control, bug tracking?
No, although people do have personal development setups. Our code base is in a shared folder on a 'server' which is kept current with a synching tool.
How would you go about trying to create a decent environment for programming, without getting too much in the way of the individual scientists (esp. physicists are stubborn people!)
One step at a time! I am trying to replace the shared folder with something a bit more solid, perhaps finding a SVN client which mimics the current synching tools behaviour would help.
I'd say though on the whole, for most natural science projects, time is generally better spent doing research!
Ex-academic physicist and now industrial physicist UK here:
What languages/environments have you used for developing scientific software, esp. data analysis? What libraries? (E.g., what do you use for plotting?)
I mainly use MATLAB these days (easy to access visualisation functions and maths). I used to use Fortran a lot and IDL. I have used C (but I'm more a reader than a writer of C), Excel macros (ugly and confusing). I'm currently needing to be able to read Java and C++ (but I can't really program in them) and I've hacked Python as well. For my own entertainment I'm now doing some programming in C# (mainly to get portability / low cost / pretty interfaces). I can write Fortran with pretty much any language I'm presented with ;-)
Was there any training for people without any significant background in programming?
Most (all?) undergraduate physics course will have a small programming course usually on C, Fortran or MATLAB but it's the real basics. I'd really like to have had some training in software engineering at some point (revision control / testing / designing medium scale systems)
Did you have anything like version control, bug tracking?
I started using Subversion / TortoiseSVN relatively recently. Groups I've worked with in the past have used revision control. I don't know any academic group which uses formal bug tracking software. I still don't use any sort of systematic testing.
How would you go about trying to create a decent environment for programming, without getting too much in the way of the individual scientists (esp. physicists are stubborn people!)
I would try to introduce some software engineering ideas at undergraduate level and then reinforce them by practice at graduate level, also provide pointers to resources like the Software Carpentry course mentioned above.
I'd expect that a significant fraction of academic physicists will be writing software (not necessarily all though) and they are in dire need of at least an introduction to ideas in software engineering.
What languages/environments have you used for developing scientific software, esp. data analysis? What libraries? (E.g., what do you use for plotting?)
Python, NumPy and pylab (plotting).
Was there any training for people without any significant background in programming?
No, but I was working in a multimedia research lab, so almost everybody had a computer science background.
Did you have anything like version control, bug tracking?
Yes, Subversion for version control, Trac for bug tracing and wiki. You can get free bug tracker/version control hosting from http://www.assembla.com/ if their TOS fits your project.
How would you go about trying to create a decent environment for programming, without getting too much in the way of the individual scientists (esp. physicists are stubborn people!).
Make sure the infrastructure is set up and well maintained and try to sell the benefits of source control.
I'm a statistician at a university in the UK. Generally people here use R for data analysis, it's fairly easy to learn if you know C/Perl. Its real power is in the way you can import and modify data interactively. It's very easy to take a number of say CSV (or Excel) files and merge them, create new columns based on others and then throw that into a GLM, GAM or some other model. Plotting is trivial too and doesn't require knowledge of a whole new language (like PGPLOT or GNUPLOT.) Of course, you also have the advantage of having a bunch of built-in features (from simple things like mean, standard deviation etc all the way to neural networks, splines and GL plotting.)
Having said this, there are a couple of issues. With very large datasets R can become very slow (I've only really seen this with >50,000x30 datasets) and since it's interpreted you don't get the advantage of Fortran/C in this respect. But, you can (very easily) get R to call C and Fortran shared libraries (either from something like netlib or ones you've written yourself.) So, a usual workflow would be to:
Work out what to do.
Prototype the code in R.
Run some preliminary analyses.
Re-write the slow code into C or Fortran and call that from R.
Which works very well for me.
I'm one of the only people in my department (of >100 people) using version control (in my case using git with githuib.com.) This is rather worrying, but they just don't seem to be keen on trying it out and are content with passing zip files around (yuck.)
My suggestion would be to continue using LabView for the acquisition (and perhaps trying to get your co-workers to agree on a toolset for acquisition and making is available for all) and then move to exporting the data into a CSV (or similar) and doing the analysis in R. There's really very little point in re-inventing the wheel in this respect.
What languages/environments have you used for developing scientific software, esp. data analysis? What libraries? (E.g., what do you use for plotting?)
My undergraduate physics department taught LabVIEW classes and used it extensively in its research projects.
The other alternative is MATLAB, in which I have no experience. There are camps for either product; each has its own advantages/disadvantages. Depending on what kind of problems you need to solve, one package may be more preferable than the other.
Regarding data analysis, you can use whatever kind of number cruncher you want. Ideally, you can do the hard calculations in language X and format the output to plot nicely in Excel, Mathcad, Mathematica, or whatever the flavor du jour plotting system is. Don't expect standardization here.
Did you have anything like version control, bug tracking?
Looking back, we didn't, and it would have been easier for us all if we did. Nothing like breaking everything and struggling for hours to fix it!
Definitely use source control for any common code. Encourage individuals to write their code in a manner that could be made more generic. This is really just coding best practices. Really, you should have them teaching (or taking) a computer science class so they can get the basics.
How would you go about trying to create a decent environment for programming, without getting too much in the way of the individual scientists (esp. physicists are stubborn people!)
There is a clear split between data aquisition (DAQ) and data analysis. Meaning, it's possible to standardize on the DAQ and then allow the scientists to play with the data in the program of their choice.
Another good option is Scilab. It has graphic modules à la LabVIEW, it has its own programming language and you can also embed Fortran and C code, for example. It's being used in public and private sectors, including big industrial companies. And it's free.
About versioning, some prefer Mercurial, as it gives more liberties managing and defining the repositories. I have no experience with it, however.
For plotting I use Matplotlib. I will soon have to make animations, and I've seen good results using MEncoder. Here is an example including an audio track.
Finally, I suggest going modular, this is, trying to keep main pieces of code in different files, so code revision, understanding, maintenance and improvement will be easier. I have written, for example, a Python module for file integrity testing, another for image processing sequences, etc.
You should also consider developing with the use a debugger that allows you to check variable contents at settable breakpoints in the code, instead using print lines.
I have used Eclipse for Python and Fortran developing (although I got a false bug compiling a Fortran short program with it, but it may have been a bad configuration) and I'm starting to use the Eric IDE for Python. It allows you to debug, manage versioning with SVN, it has an embedded console, it can do refactoring with Bicycle Repair Man (it can use another one, too), you have Unittest, etc. A lighter alternative for Python is IDLE, included with Python since version 2.3.
As a few hints, I also suggest:
Not using single-character variables. When you want to search appearances, you will get results everywhere. Some argue that a decent IDE makes this easier, but then you will depend on having permanent access to the IDE. Even using ii, jj and kk can be enough, although this choice will depend on your language. (Double vowels would be less useful if code comments are made in Estonian, for instance).
Commenting the code from the very beginning.
For critical applications sometimes it's better to rely on older language/compiler versions (major releases), more stable and better debugged.
Of course you can have more optimized code in later versions, fixed bugs, etc, but I'm talking about using Fortran 95 instead of 2003, Python 2.5.4 instead of 3.0, or so. (Specially when a new version breaks backwards compatibility.) Lots of improvements usually introduce lots of bugs. Still, this will depend on specific application cases!
Note that this is a personal choice, many people could argue against this.
Use redundant and automated backup! (With versioning control).
Definitely, use Subversion to keep current, work-in-progress, and stable snapshot copies of source code. This includes C++, Java etc. for homegrown software tools, and quickie scripts for one-off processing.
With the strong leaning in science and applied engineering toward "lone cowboy" development methodology, the usual practice of organizing the repository into trunk, tag and whatever else it was - don't bother! Scientists and their lab technicians like to twirl knobs, wiggle electrodes and chase vacuum leaks. It's enough of a job to get everyone to agree to, say Python/NumPy or follow some naming convention; forget trying to make them follow arcane software developer practices and conventions.
For source code management, centralized systems such as Subversion are superior for scientific use due to the clear single point of truth (SPOT). Logging of changes and ability to recall versions of any file, without having chase down where to find something, has huge record-keeping advantages. Tools like Git and Monotone: oh my gosh the chaos I can imagine that would follow! Having clear-cut records of just what version of hack-job scripts were used while toying with the new sensor when that Higgs boson went by or that supernova blew up, will lead to happiness.
What languages/environments have you
used for developing scientific
software, esp. data analysis? What
libraries? (E.g., what do you use for
plotting?)
Languages I have used for numerics and sicentific-related stuff:
C (slow development, too much debugging, almost impossible to write reusable code)
C++ (and I learned to hate it -- development isn't as slow as C, but can be a pain. Templates and classes were cool initially, but after a while I realized that I was fighting them all the time and finding workarounds for language design problems
Common Lisp, which was OK, but not widely used fo Sci computing. Not easy to integrate with C (if compared to other languages), but works
Scheme. This one became my personal choice.
My editor is Emacs, although I do use vim for quick stuff like editing configuration files.
For plotting, I usually generate a text file and feed it into gnuplot.
For data analysis, I usually generate a text file and use GNU R.
I see lots of people here using FORTRAN (mostly 77, but some 90), lots of Java and some Python. I don't like those, so I don't use them.
Was there any training for people
without any significant background in
programming?
I think this doesn't apply to me, since I graduated in CS -- but where I work there is no formal training, but people (Engineers, Physicists, Mathematicians) do help each other.
Did you have anything like version
control, bug tracking?
Version control is absolutely important! I keep my code and data in three different machines, in two different sides of the world -- in Git repositories. I sync them all the time (so I have version control and backups!) I don't do bug control, although I may start doing that.
But my colleagues don't BTS or VCS at all.
How would you go about trying to
create a decent environment for
programming, without getting too much
in the way of the individual
scientists (esp. physicists are
stubborn people!)
First, I'd give them as much freedom as possible. (In the University where I work I could chooe between having someone install Ubuntu or Windows, or install my own OS -- I chose to install my own. I don't have support from them and I'm responsible for anything that happens with my machins, including security issues, but I do whatever I want with the machine).
Second, I'd see what they are used to, and make it work (need FORTRAN? We'll set it up. Need C++? No problem. Mathematica? OK, we'll buy a license). Then see how many of them would like to learn "additional tools" to help them be more productive (don't say "different" tools. Say "additional", so it won't seem like anyone will "lose" or "let go" or whatever). Start with editors, see if there are groups who would like to use VCS to sync their work (hey, you can stay home and send your code through SVN or GIT -- wouldn't that be great?) and so on.
Don't impose -- show examples of how cool these tools are. Make data analysis using R, and show them how easy it was. Show nice graphics, and explain how you've created them (but start with simple examples, so you can quickly explain them).
I would suggest F# as a potential candidate for performing science-related manipulations given its strong semantic ties to mathematical constructs.
Also, its support for units-of-measure, as written about here makes a lot of sense for ensuring proper translation between mathematical model and implementation source code.
First of all, I would definitely go with a scripting language to avoid having to explain a lot of extra things (for example manual memory management is - mostly - ok if you are writing low-level, performance sensitive stuff, but for somebody who just wants to use a computer as an upgraded scientific calculator it's definitely overkill). Also, look around if there is something specific for your domain (as is R for statistics). This has the advantage of already working with the concepts the users are familiar with and having specialized code for specific situations (for example calculating standard deviations, applying statistical tests, etc in the case of R).
If you wish to use a more generic scripting language, I would go with Python. Two things it has going for it are:
The interactive shell where you can experiment
Its clear (although sometimes lengthy) syntax
As an added advantage, it has libraries for most of the things you would want to do with it.
I'm no expert in this area, but I've always understood that this is what MATLAB was created for. There is a way to integrate MATLAB with SVN for source control as well.