Where can I find UML diagrams (instead of reinventing the wheel)? - repository

I am currently trying to draw a set of UML diagrams to represent products, offers, orders, deliveries and payments. These diagrams have probably been invented by a million developers before me.
Are there any efforts to standardize the modeling of such common things? Or even the modeling of specific domains (for example car-manufacturing).
Do you know if there is some sort of repository containing UML diagrams (class diagrams, sequence diagrams, state diagrams...)?

There is a movement for documenting (as opposed to standardizing) models for certain domains. These are called analysis patterns and is a term Martin Fowler came up with. He actually wrote a book called Analysis patterns. Also, he has a dedicated section on his website where he presents some of these patterns accompanied by UML diagrams.
Maybe you'll find some inspiration that will help you in modeling your domain. I've stressed the word inspiration as I think different businesses have different requirements although they operate the same domain so the solutions you might read about may not be appropriate for your problem.

There are many tools out there that do both - but they're generally not free!
Microsoft Visio does both and is extensible. For UML artefacts they come with auto generators into VB/Java template code - but you can modify them to auto-generate any code. There are many users of Visio that have created models from which to use as templates.
Artisan Enterprize is by far the most powerful UML tool (but it's not cheap).
Some would argue that Rational Rose or RUP is the better tool
But for Car-Manufacturing and other similar real world modelling, by far the best tool is Mathworks Simulink (not because it's one of the most expensive). It is by far the best tool beccause you can animate the model - you can prove the model working before generating the slik code (in whatever grammar/language/other Models you care to push it)!
You can obtain a student license for around £180; with the 'real thing' pushing £4000 (for car-related artefacts). The full product with all the trimmings is about £15k. Simulink is also extensible with a C like language though there is a .Net addin and APIs to use a plethora of other langhuages. And, just like Visio there is a world-wide forum creating saleable, shareware & freeware real world model templates. Many world-wide Auto-Manufacturers are already using Simulink.

I think that MiniQuark question is really good and will sooner or later be provided by vendors such as Omondo, Rational IBM etc... Users doesn't just need tools, they need models out of the box and just add their business rules inside an existing well defined architecture. Why to develop from scratch a new architecture if the job has already be done ? In Java we use plenty of frameworks, existing methods etc...so why not to go one level higher and reuse architecture ? It is today impossible to guess how a project will evole and new demands are coming every day. We therefore need a stable architecture which has been tested previously and is extensible. I have seen so many projects starting with a nice architecture then realizing in the middle of the project that this is not what is the best and then changing their architecture. Renaming classes, splitting classes, creating packages etc...after the first iteration it is getting a real mess. Could you imagine what we found after 10 iterations !! a total mess !!
This mess would had been avoided if using a predefined model which has been tested previously because the missing class, or package etc..would have already been created and only a class rename would be sufficient for architecture purposes. Adding business rules methods will end the codding stage before deployment test.
I think there is a confusion between patterns and the initial question which is related to UML model re usability.
There is no today any reusable model out of the box which has been developped. This is really strange but the job has never been done or never been shared.
Omondo has tried to launch an initiative without real success. I have heard that they are working on hundred of out of box models which will be open source and given for free to the community. I hope this will be done because this is really important for me and would save me a lot of time at the beginning of a project.

Related

Features and Use Case Diagrams Vs Requirements and Use Cases

According to "Head First Object-Oriented Analysis and Design", Complex projects involves first finding a feature list -> drawing use case diagrams -> breaking into smaller modules before implementing object oriented design (requirements gathering -> use cases -> OO -> design patterns etc.)
I want to understand, what is the criteria for the size of project when feature lists and use case diagrams should be implemented before finding requirements and writing use cases?
I am particularly interested in how can this knowledge be applied to my real wold problems
Example, I am working on a UI that send instrument commands to the server and displays the response back from the server. I know from customer feedback that the UI should have the following things:
It should be able to let the user select an instrument from available list and send any custom command and display the result
It should be able to let the user select an instrument and a command from available list and display the result (create commands using drag and drops from given lists)
It should be able to have capability of creating macros
Is this UI project small enough to not have steps for gathering features and drawing use case diagrams? Do we go straight to categorizing the asks as requirements and start gathering requirements and writing use cases?
How would one go about breaking down project of this nature to deduce it to its appropriate class diagrams?
I have tried considering the above mentioned asks as features and then tried creating requirements, mainly on the different states that one could have during the life cycle of the UI application but I am still not sure and unable to comprehend the teachings of the books on this project.
I haven't read the book, so I'm not sure what the author(s) of the book really wanted to emphasize here. But I assume that you misinterpreted it.
Without knowing the requirements there is no feature list. If you don't know what is needed then you can't say anything about the system's capabilities.
Gathering requirements is an iterative process. First you gather the high-level requirements in order to be able to start building a mental model about the system. This can help you to start think about the supported features. By sharing your mental model and the exposed feature set of the system with the stakeholder, it initiates the next iteration.
Here you can start talking about actors, user journeys, use cases, etc. These are mainly focusing on the happy paths. As you have more and more iterations you will reach a point where you can start talking about edge and corner cases: What suboptimal cases can we foreseen? What can we do (prevention, detection, mitigation)? How does it affect the system/actors/journeys?...
The better you understand the needs and circumstances, the better the design and implementation of the system could be.
UPDATE #1
Will we always have high-level and low-level (edge cases and detailed use cases) requirements i.e. we will first need to make use case diagrams and then write individual detailed use cases?
There are a lot of factors which can influence this. Just to name a few:
Is it a system, submodule, or component design?
Is it a green or a brownfield project?
Is the stakeholder experienced enough to know which information matters and which doesn't from the IT project perspecitive?
Does the architect / system designer have previous experience with the same domain?
Does wireframe or mockup exist at project kick-off?
Should the project satisfy special security, legal or governmental regulations?
etc...
In short, yes there can be circumstances where you don't need several iterations, but based on my experiences that's quite rare.

How to bind UML with code?

I am beginning in UML and software analyse and i do not understand how UML and diagrams can influence coding and software architecture while we can directly build the code and its data base without diagrams.
I read lot of tutorials abouat the subject but not enough to understand the utility of UML in coding.
I understand everey diagram and its role. That is not my problem but i do not yet understand their roles after the analyse and design phase.
So what is the role of UML in coding phase of a software ?
Thank you.
The comment by #xmojmr already puts it right. UML creates a model (hence the M in UML) of a system. A model reduces information of a system to a level so it is a) manageable and b) complete. Human brains are not computers and you need a means of communication what the system is all about. You can do that as pure code, as paper document and as UML model. A combination of all is not uncommon. As long as you have tiny systems you can live with pur code and tools like Doxygen. But once it starts getting complex you need some handles. UML offers these to end users, architects, testers, developers, managers, etc. Along with UML you will also need a methodology. UML delivers the syntax how to document a system. But you need some structure above to write a nice novel.
UML-based models play an essential role for coding/implementing a software system in model-based (or model-driven) development. The basic idea is that you start making a model of your problem domain (the domain model), then you derive from it a platform-independent design model, which can be transformed into platform-specific implementation models (e.g. for Java- or C#-based platforms) that are finally encoded in the target languages.
The most prominent part of model-based development is the encoding of model classes (forming the model layer in an MVC architecture for apps) based on a data model (a UML class model) that has been derived from an information design model, which was obtained from a domain information model (where all these information/data models are UML class models).
You can find an instructive example of model-based development in my tutorial book Engineering Front-End Web Apps with Plain JavaScript.
This one is in my point of view a duplicate of that other question. It can't be flagged because there is no accepted answer. The related question on meta stackexchange does not provide a clear solution to that situation.
I think my personal answer was relevant and is applicable to the current question.
To be synthetic, Martin Fowler considers current uses of UML. I think he describe the current practices. Perhaps should these evolve ?
Perhaps would the initial question be the right place to discuss ?

What are the most practical Object-oriented software modeling methods in real world projects?

I want to develope a big project, but I really don't know what is the best way to model my project. Do I even need to model my project?
What are the most practical OOP software modeling methods in real world projects? What are the best and most useful ones?
Many times its needed to capture the complex structure of classes you have in you OO system, so class diagrams from UML are used for modeling. You can also want to describe interactions of classes, for that sequence diagrams are useful. There are also other UML diagrams and each has its purpose.
If you are looking for an approach to modeling, try looking at Unified Process, which is adevelopment method, which is created by authors of UML and uses UML quite heavily and also describes how UML can be used.
Agile methodology is currently what is recommended. If you add a slice of UML then it would be better :-)
Modeling (design) is the most important part of every project.
In fact as times goes by, we sacrifice performance to gain higher level of design.
Why .NET framework is popular (compare to old tools) ? In most cases its libraries are wrappers over traditional win32 APIs, a waste of performance, instead it provides better design, which makes it easy to learn and use.
So if your project have a good design it would be easy to understand, develop, debug, maintain and extend.
Another example is OOP itself which has classes, interfaces... and bunch of constructor/destructor calls. OOP concepts are borrowed from psychiatry and the way human being see the world.
Here are two different concepts:
1) Design methodology
2) Project management methodology
There are many and I don't name good or bad. Each of them fits a scenario.
About design methodology I prefer DDD (Domain Driven Design) as it maps the industry domain terminology and concepts. So if you have a decision problem about what to do if A->B->C happened, simply you can ask a domain professional and he will say what they do in real world. DDD is good for old enough industries that have cumulative wisdom. I'm not gonna write more about design since we don't know about the project.
Project management methodologies (like agile) are the way you build the building from the map (design). The goal of project management is to use resources optimal (time, money, human resources...). This is done through work breakdown structure and make work as parallel as possible. The most known project management methodology is the traditional one in which we do everything in sequence, as civil engineers do (foundation, structure, walls...). This was good for many centuries until last decades (software industry), since in traditional project management you know where you are, where you want to go, and how to reach there. This way you can buy your furniture for a home that's a land yet !
Software industry has very rapid changes in tools and methods because is was new and no best practices were founded on thousands of failed projects. Many times when a project started it has changes because of changes in developing tools and frameworks. Other source of change is the scope of the project (where to go). Software is an intangible product so you fall in the trap of time estimations easily. For software development best practice are iterative methodologies.
Iterative methodologies suggest, a working incomplete solution which you make more complete in next iterate, rather than a non working partially complete one. This has a time overhead, instead, you sure the solution works and if any problem, you find in early stages. That's why we have nightly builds !
The best is Visual Studio 2010 Ultimate others are too cumbersome. Otherwise use light tools like yuml see http://askuml.com for samples.

Experiences with using Alloy in real-world projects

I have been interested in formal methods for some time. I have used formal methods to reason about some very specific sub-areas of a few projects I have been working on. I was never able to convince other team members to try the same let alone specify an entire domain with a formal method.
One method I have found particularly interesting is Alloy. I think that it may "scale" better as foundation for an entire project because it is conceptually and notationally very close to actual programming languages. Furthermore, the tools are quite solid so that the benefits of model verification are readily available.
I'd be very much interested to hear about any real-world experiences you folks might have had with using Alloy in your projects. Do you feel that it has helped you in designing a better domain model? Did find errors in your domain model during verification? Would you use it again?
I've used Alloy on a few projects and have found it helpful; on some but not all of those projects I have been able to persuade others involved to use Alloy as well, or at least to work with the Alloy models I wrote. These projects may or may not be what you have in mind in asking for 'real-world' projects, but they certainly took place in the part of the real world I work in.
In 2006 and 2007 I created a partial Alloy model for the then-current draft of the W3C XProc specification; as far as I could tell, most members of the working group never read the paper I wrote (at http://www.w3.org/XML/XProc/2006/12/alloy-models/models.html); they said "Oh, we changed that part of the spec last week, so what the model says is no longer relevant". But the paper did manage to persuade the editor of the spec that the abstract 'component' level described in the first draft of the spec was woefully underspecified and needed to be either fully specified or dropped. He dropped it, with (I think) good results for the readability and usability of the spec.
In 2010 I made an Alloy model of the XPath 1.0 data model, which uncovered some glitches in the specification. The reaction of most interested parties (including the W3C working group responsible for maintaining the XPath 1.0 spec) has, unfortunately, not been encouraging.
A research project I'm involved with has used Alloy to model the MLCD Overlap Corpus, a collection of sample documents and related information we are creating (hyperlinks suppressed at SO's insistence); the Alloy model found a couple of errors in our initial design for the corpus catalog, so it was well worth the effort.
And we have also used Alloy to formalize some modeling work we have done on the nature of transcription and on the extension of the type/token distinction to document structure (for our paper, look for the 2010 proceedings of Balisage: The Markup Conference). This lies a little bit outside Alloy's usual area of application, as it has nothing to do with software design, but Alloy's ability to check models for consistency and generate instances has been invaluable in showing us some of the logical consequences of this or that possible axiom for our model.
To answer your specific questions: yes, Alloy has helped me specify cleaner domain models, and yes, it has found errors and glitches. They have often been small, for the reasons Daniel Jackson explains in his book Software Abstractions: first, if you use models during design, you catch errors early, when everything is still small. And, second (in Jackson's words), "In hindsight, most software design issues are trivial."
He continues: "But if you don't address them head-on, trivial issues have a nasty habit of becoming nontrivial." My experience amply confirms this. Much better to head off such problems early. So yes, I will use Alloy again.
Yes, I've used Alloy and it's cousins industrially. Alloy has been most helpful in convincing me that my models weren't wildly wrong---or rather, showing me where they were wrong and gave rise to silly results. Other more specific tools, like Song's Athena and Guttman and Ramsdell's CPSA have been more useful in their narrower domains. What more would you like to hear about?
Belatedly adding to this thread... Eunsuk Kang has recently applied Alloy to perform security analyses of web APIs for some start ups (following many applications of Alloy in security such as Apurva's analysis of OAuth and Barth et al's analysis of browser based security mechanisms for CSRF etc); Pamela Zave has been working on an impressive analysis of Chord, a peer to peer storage system, and has recently written up a fix to the original algorithm.

Practices for programming in a scientific environment? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
Background
Last year, I did an internship in a physics research group at a university. In this group, we mostly used LabVIEW to write programs for controlling our setups, doing data acquisition and analyzing our data. For the first two purposes, that works quite OK, but for data analysis, it's a real pain. On top of that, everyone was mostly self-taught, so code that was written was generally quite a mess (no wonder that every PhD quickly decided to rewrite everything from scratch). Version control was unknown, and impossible to set up because of strict software and network regulations from the IT department.
Now, things actually worked out surprisingly OK, but how do people in the natural sciences do their software development?
Questions
Some concrete questions:
What languages/environments have you used for developing scientific software, especially data analysis? What libraries? (for example, what do you use for plotting?)
Was there any training for people without any significant background in programming?
Did you have anything like version control, and bug tracking?
How would you go about trying to create a decent environment for programming, without getting too much in the way of the individual scientists (especially physicists are stubborn people!)
Summary of answers thus far
The answers (or my interpretation of them) thus far: (2008-10-11)
Languages/packages that seem to be the most widely used:
LabVIEW
Python
with SciPy, NumPy, PyLab, etc. (See also Brandon's reply for downloads and links)
C/C++
MATLAB
Version control is used by nearly all respondents; bug tracking and other processes are much less common.
The Software Carpentry course is a good way to teach programming and development techniques to scientists.
How to improve things?
Don't force people to follow strict protocols.
Set up an environment yourself, and show the benefits to others. Help them to start working with version control, bug tracking, etc. themselves.
Reviewing other people's code can help, but be aware that not everyone may appreciate that.
What languages/environments have you used for developing scientific software, esp. data analysis? What libraries? (E.g., what do you use for plotting?)
I used to work for Enthought, the primary corporate sponsor of SciPy. We collaborated with scientists from the companies that contracted Enthought for custom software development. Python/SciPy seemed to be a comfortable environment for scientists. It's much less intimidating to get started with than say C++ or Java if you're a scientist without a software background.
The Enthought Python Distribution comes with all the scientific computing libraries including analysis, plotting, 3D visualation, etc.
Was there any training for people without any significant background in programming?
Enthought does offer SciPy training and the SciPy community is pretty good about answering questions on the mailing lists.
Did you have anything like version control, bug tracking?
Yes, and yes (Subversion and Trac). Since we were working collaboratively with the scientists (and typically remotely from them), version control and bug tracking were essential. It took some coaching to get some scientists to internalize the benefits of version control.
How would you go about trying to create a decent environment for programming, without getting too much in the way of the individual scientists (esp. physicists are stubborn people!)
Make sure they are familiarized with the tool chain. It takes an investment up front, but it will make them feel less inclined to reject it in favor of something more familiar (Excel). When the tools fail them (and they will), make sure they have a place to go for help — mailing lists, user groups, other scientists and software developers in the organization. The more help there is to get them back to doing physics the better.
The course Software Carpentry is aimed specifically at people doing scientific computing and aims to teach the basics and lessons of software engineering, and how best to apply them to projects.
It covers topics like version control, debugging, testing, scripting and various other issues.
I've listened to about 8 or 9 of the lectures and think it is to be highly recommended.
Edit: The MP3s of the lectures are available as well.
Nuclear/particle physics here.
Major programing work used to be done mostly in Fortran using CERNLIB (PAW, MINUIT, ...) and GEANT3, recently it has mostly been done in C++ with ROOT and Geant4. There are a number of other libraries and tools in specialized use, and LabVIEW sees some use here and there.
Data acquisition in my end of this business has often meant fairly low level work. Often in C, sometimes even in assembly, but this is dying out as the hardware gets more capable. On the other hand, many of the boards are now built with FPGAs which need gate twiddling...
One-offs, graphical interfaces, etc. use almost anything (Tcl/Tk used to be big, and I've been seeing more Perl/Tk and Python/Tk lately) including a number of packages that exist mostly inside the particle physics community.
Many people writing code have little or no formal training, and process is transmitted very unevenly by oral tradition, but most of the software group leaders take process seriously and read as much as necessary to make up their deficiencies in this area.
Version control for the main tools is ubiquitous. But many individual programmers neglect it for their smaller tasks. Formal bug tracking tools are less common, as are nightly builds, unit testing, and regression tests.
To improve things:
Get on the good side of the local software leaders
Implement the process you want to use in your own area, and encourage those you let in to use it too.
Wait. Physicists are empirical people. If it helps, they will (eventually!) notice.
One more suggestion for improving things.
Put a little time in to helping anyone you work directly with. Review their code. Tell them about algorithmic complexity/code generation/DRY or whatever basic thing they never learned because some professor threw a Fortran book at them once and said "make it work". Indoctrinate them on process issues. They are smart people, and they will learn if you give them a chance.
This might be slightly tangential, but hopefully relevant.
I used to work for National Instruments, R&D, where I wrote software for NI RF & Communication toolkits. We used LabVIEW quite a bit, and here are the practices we followed:
Source control. NI uses Perforce. We did the regular thing - dev/trunk branches, continuous integration, the works.
We wrote automated test suites.
We had a few people who came in with a background in signal processing and communication. We used to have regular code reviews, and best practices documents to make sure their code was up to the mark.
Despite the code reviews, there were a few occasions when "software guys", like me had to rewrite some of this code for efficiency.
I know exactly what you mean about stubborn people! We had folks who used to think that pointing out a potential performance improvement in their code was a direct personal insult! It goes without saying that that this calls for good management. I thought the best way to deal with these folks is to go slowly, not press to hard for changes and if necessary be prepared to do the dirty work. [Example: write a test suite for their code].
I'm not exactly a 'natural' scientist (I study transportation) but am an academic who writes a lot of my own software for data analysis. I try to write as much as I can in Python, but sometimes I'm forced to use other languages when I'm working on extending or customizing an existing software tool. There is very little programming training in my field. Most folks are either self-taught, or learned their programming skills from classes taken previously or outside the discipline.
I'm a big fan of version control. I used Vault running on my home server for all the code for my dissertation. Right now I'm trying to get the department to set up a Subversion server, but my guess is I will be the only one who uses it, at least at first. I've played around a bit with FogBugs, but unlike version control, I don't think that's nearly as useful for a one-man team.
As for encouraging others to use version control and the like, that's really the problem I'm facing now. I'm planning on forcing my grad students to use it on research projects they're doing for me, and encouraging them to use it for their own research. If I teach a class involving programming, I'll probably force the students to use version control there too (grading them on what's in the repository). As far as my colleagues and their grad students go, all I can really do is make a server available and rely on gentle persuasion and setting a good example. Frankly, at this point I think it's more important to get them doing regular backups than get them on source control (some folks are carrying around the only copy of their research data on USB flash drives).
1.) Scripting languages are popular these days for most things due to better hardware. Perl/Python/Lisp are prevalent for lightweight applications (automation, light computation); I see a lot of Perl at my work (computational EM) since we like Unix/Linux. For performance stuff, C/C++/Fortran are typically used. For parallel computing, well, we usually manually parallelize runs in EM as opposed to having a program implicitly do it (ie split up the jobs by look angle when computing radar cross sections).
2.) We just kind of throw people into the mix here. A lot of the code we have is very messy, but scientists are typically a scatterbrained bunch that don't mind that sort of thing. Not ideal, but we have things to deliver and we're severely understaffed. We're slowly getting better.
3.) We use SVN; however, we do not have bug tracking software. About as good as it gets for us is a txt file that tells you where bugs specific bugs are.
4.) My suggestion for implementing best practices for scientists: do it slowly. As scientists, we typically don't ship products. No one in science makes a name for himself by having clean, maintainable code. They get recognition from the results of that code, typically. They need to see justification for spending time on learning software practices. Slowly introduce new concepts and try to get them to follow; they're scientists, so after their own empirical evidence confirms the usefulness of things like version control, they will begin to use it all the time!
I'd highly recommend reading "What Every Computer Scientist Should Know About Floating-Point Arithmetic". A lot of problems I encounter on a regular basis come from issues with floating point programming.
I am a physicist working in the field of condensed matter physics, building classical and quantum models.
Languages:
C++ -- very versatile: can be used for anything, good speed, but it can be a bit inconvenient when it comes to MPI
Octave -- good for some supplementary calculations, very convenient and productive
Libraries:
Armadillo/Blitz++ -- fast array/matrix/cube abstractions for C++
Eigen/Armadillo -- linear algebra
GSL -- to use with C
LAPACK/BLAS/ATLAS -- extremely big and fast, but less convenient (and written in FORTRAN)
Graphics:
GNUPlot -- it has very clean and neat output, but not that productive sometimes
Origin -- very convenient for plotting
Development tools:
Vim + plugins -- it works great for me
GDB -- a great debugging tool when working with C/C++
Code::Blocks -- I used it for some time and found it quite comfortable, but Vim is still better in my opinion.
I work as a physicist in a UK university.
Perhaps I should emphasise that different areas of research have different emphasis on programming. Particle physicists (like dmckee) do computational modelling almost exclusively and may collaborate on large software projects, whereas people in fields like my own (condensed matter) write code relatively infrequently. I suspect most scientists fall into the latter camp. I would say coding skills are usually seen as useful in physics, but not essential, much like physics/maths skills are seen as useful for programmers but not essential. With this in mind...
What languages/environments have you used for developing scientific software, esp. data analysis? What libraries? (E.g., what do you use for plotting?)
Commonly data analysis and plotting is done using generic data analysis packages such as IGOR Pro, ORIGIN, Kaleidegraph which can be thought of as 'Excel plus'. These packages typically have a scripting language that can be used to automate. More specialist analysis may have a dedicated utility for the job that generally will have been written a long time ago, no-one has the source for and is pretty buggy. Some more techie types might use the languages that have been mentioned (Python, R, MatLab with Gnuplot for plotting).
Control software is commonly done in LabVIEW, although we actually use Delphi which is somewhat unusual.
Was there any training for people without any significant background in programming?
I've been to seminars on grid computing, 3D visualisation, learning Boost etc. given by both universities I've been at. As an undergraduate we were taught VBA for Excel and MatLab but C/MatLab/LabVIEW is more common.
Did you have anything like version control, bug tracking?
No, although people do have personal development setups. Our code base is in a shared folder on a 'server' which is kept current with a synching tool.
How would you go about trying to create a decent environment for programming, without getting too much in the way of the individual scientists (esp. physicists are stubborn people!)
One step at a time! I am trying to replace the shared folder with something a bit more solid, perhaps finding a SVN client which mimics the current synching tools behaviour would help.
I'd say though on the whole, for most natural science projects, time is generally better spent doing research!
Ex-academic physicist and now industrial physicist UK here:
What languages/environments have you used for developing scientific software, esp. data analysis? What libraries? (E.g., what do you use for plotting?)
I mainly use MATLAB these days (easy to access visualisation functions and maths). I used to use Fortran a lot and IDL. I have used C (but I'm more a reader than a writer of C), Excel macros (ugly and confusing). I'm currently needing to be able to read Java and C++ (but I can't really program in them) and I've hacked Python as well. For my own entertainment I'm now doing some programming in C# (mainly to get portability / low cost / pretty interfaces). I can write Fortran with pretty much any language I'm presented with ;-)
Was there any training for people without any significant background in programming?
Most (all?) undergraduate physics course will have a small programming course usually on C, Fortran or MATLAB but it's the real basics. I'd really like to have had some training in software engineering at some point (revision control / testing / designing medium scale systems)
Did you have anything like version control, bug tracking?
I started using Subversion / TortoiseSVN relatively recently. Groups I've worked with in the past have used revision control. I don't know any academic group which uses formal bug tracking software. I still don't use any sort of systematic testing.
How would you go about trying to create a decent environment for programming, without getting too much in the way of the individual scientists (esp. physicists are stubborn people!)
I would try to introduce some software engineering ideas at undergraduate level and then reinforce them by practice at graduate level, also provide pointers to resources like the Software Carpentry course mentioned above.
I'd expect that a significant fraction of academic physicists will be writing software (not necessarily all though) and they are in dire need of at least an introduction to ideas in software engineering.
What languages/environments have you used for developing scientific software, esp. data analysis? What libraries? (E.g., what do you use for plotting?)
Python, NumPy and pylab (plotting).
Was there any training for people without any significant background in programming?
No, but I was working in a multimedia research lab, so almost everybody had a computer science background.
Did you have anything like version control, bug tracking?
Yes, Subversion for version control, Trac for bug tracing and wiki. You can get free bug tracker/version control hosting from http://www.assembla.com/ if their TOS fits your project.
How would you go about trying to create a decent environment for programming, without getting too much in the way of the individual scientists (esp. physicists are stubborn people!).
Make sure the infrastructure is set up and well maintained and try to sell the benefits of source control.
I'm a statistician at a university in the UK. Generally people here use R for data analysis, it's fairly easy to learn if you know C/Perl. Its real power is in the way you can import and modify data interactively. It's very easy to take a number of say CSV (or Excel) files and merge them, create new columns based on others and then throw that into a GLM, GAM or some other model. Plotting is trivial too and doesn't require knowledge of a whole new language (like PGPLOT or GNUPLOT.) Of course, you also have the advantage of having a bunch of built-in features (from simple things like mean, standard deviation etc all the way to neural networks, splines and GL plotting.)
Having said this, there are a couple of issues. With very large datasets R can become very slow (I've only really seen this with >50,000x30 datasets) and since it's interpreted you don't get the advantage of Fortran/C in this respect. But, you can (very easily) get R to call C and Fortran shared libraries (either from something like netlib or ones you've written yourself.) So, a usual workflow would be to:
Work out what to do.
Prototype the code in R.
Run some preliminary analyses.
Re-write the slow code into C or Fortran and call that from R.
Which works very well for me.
I'm one of the only people in my department (of >100 people) using version control (in my case using git with githuib.com.) This is rather worrying, but they just don't seem to be keen on trying it out and are content with passing zip files around (yuck.)
My suggestion would be to continue using LabView for the acquisition (and perhaps trying to get your co-workers to agree on a toolset for acquisition and making is available for all) and then move to exporting the data into a CSV (or similar) and doing the analysis in R. There's really very little point in re-inventing the wheel in this respect.
What languages/environments have you used for developing scientific software, esp. data analysis? What libraries? (E.g., what do you use for plotting?)
My undergraduate physics department taught LabVIEW classes and used it extensively in its research projects.
The other alternative is MATLAB, in which I have no experience. There are camps for either product; each has its own advantages/disadvantages. Depending on what kind of problems you need to solve, one package may be more preferable than the other.
Regarding data analysis, you can use whatever kind of number cruncher you want. Ideally, you can do the hard calculations in language X and format the output to plot nicely in Excel, Mathcad, Mathematica, or whatever the flavor du jour plotting system is. Don't expect standardization here.
Did you have anything like version control, bug tracking?
Looking back, we didn't, and it would have been easier for us all if we did. Nothing like breaking everything and struggling for hours to fix it!
Definitely use source control for any common code. Encourage individuals to write their code in a manner that could be made more generic. This is really just coding best practices. Really, you should have them teaching (or taking) a computer science class so they can get the basics.
How would you go about trying to create a decent environment for programming, without getting too much in the way of the individual scientists (esp. physicists are stubborn people!)
There is a clear split between data aquisition (DAQ) and data analysis. Meaning, it's possible to standardize on the DAQ and then allow the scientists to play with the data in the program of their choice.
Another good option is Scilab. It has graphic modules à la LabVIEW, it has its own programming language and you can also embed Fortran and C code, for example. It's being used in public and private sectors, including big industrial companies. And it's free.
About versioning, some prefer Mercurial, as it gives more liberties managing and defining the repositories. I have no experience with it, however.
For plotting I use Matplotlib. I will soon have to make animations, and I've seen good results using MEncoder. Here is an example including an audio track.
Finally, I suggest going modular, this is, trying to keep main pieces of code in different files, so code revision, understanding, maintenance and improvement will be easier. I have written, for example, a Python module for file integrity testing, another for image processing sequences, etc.
You should also consider developing with the use a debugger that allows you to check variable contents at settable breakpoints in the code, instead using print lines.
I have used Eclipse for Python and Fortran developing (although I got a false bug compiling a Fortran short program with it, but it may have been a bad configuration) and I'm starting to use the Eric IDE for Python. It allows you to debug, manage versioning with SVN, it has an embedded console, it can do refactoring with Bicycle Repair Man (it can use another one, too), you have Unittest, etc. A lighter alternative for Python is IDLE, included with Python since version 2.3.
As a few hints, I also suggest:
Not using single-character variables. When you want to search appearances, you will get results everywhere. Some argue that a decent IDE makes this easier, but then you will depend on having permanent access to the IDE. Even using ii, jj and kk can be enough, although this choice will depend on your language. (Double vowels would be less useful if code comments are made in Estonian, for instance).
Commenting the code from the very beginning.
For critical applications sometimes it's better to rely on older language/compiler versions (major releases), more stable and better debugged.
Of course you can have more optimized code in later versions, fixed bugs, etc, but I'm talking about using Fortran 95 instead of 2003, Python 2.5.4 instead of 3.0, or so. (Specially when a new version breaks backwards compatibility.) Lots of improvements usually introduce lots of bugs. Still, this will depend on specific application cases!
Note that this is a personal choice, many people could argue against this.
Use redundant and automated backup! (With versioning control).
Definitely, use Subversion to keep current, work-in-progress, and stable snapshot copies of source code. This includes C++, Java etc. for homegrown software tools, and quickie scripts for one-off processing.
With the strong leaning in science and applied engineering toward "lone cowboy" development methodology, the usual practice of organizing the repository into trunk, tag and whatever else it was - don't bother! Scientists and their lab technicians like to twirl knobs, wiggle electrodes and chase vacuum leaks. It's enough of a job to get everyone to agree to, say Python/NumPy or follow some naming convention; forget trying to make them follow arcane software developer practices and conventions.
For source code management, centralized systems such as Subversion are superior for scientific use due to the clear single point of truth (SPOT). Logging of changes and ability to recall versions of any file, without having chase down where to find something, has huge record-keeping advantages. Tools like Git and Monotone: oh my gosh the chaos I can imagine that would follow! Having clear-cut records of just what version of hack-job scripts were used while toying with the new sensor when that Higgs boson went by or that supernova blew up, will lead to happiness.
What languages/environments have you
used for developing scientific
software, esp. data analysis? What
libraries? (E.g., what do you use for
plotting?)
Languages I have used for numerics and sicentific-related stuff:
C (slow development, too much debugging, almost impossible to write reusable code)
C++ (and I learned to hate it -- development isn't as slow as C, but can be a pain. Templates and classes were cool initially, but after a while I realized that I was fighting them all the time and finding workarounds for language design problems
Common Lisp, which was OK, but not widely used fo Sci computing. Not easy to integrate with C (if compared to other languages), but works
Scheme. This one became my personal choice.
My editor is Emacs, although I do use vim for quick stuff like editing configuration files.
For plotting, I usually generate a text file and feed it into gnuplot.
For data analysis, I usually generate a text file and use GNU R.
I see lots of people here using FORTRAN (mostly 77, but some 90), lots of Java and some Python. I don't like those, so I don't use them.
Was there any training for people
without any significant background in
programming?
I think this doesn't apply to me, since I graduated in CS -- but where I work there is no formal training, but people (Engineers, Physicists, Mathematicians) do help each other.
Did you have anything like version
control, bug tracking?
Version control is absolutely important! I keep my code and data in three different machines, in two different sides of the world -- in Git repositories. I sync them all the time (so I have version control and backups!) I don't do bug control, although I may start doing that.
But my colleagues don't BTS or VCS at all.
How would you go about trying to
create a decent environment for
programming, without getting too much
in the way of the individual
scientists (esp. physicists are
stubborn people!)
First, I'd give them as much freedom as possible. (In the University where I work I could chooe between having someone install Ubuntu or Windows, or install my own OS -- I chose to install my own. I don't have support from them and I'm responsible for anything that happens with my machins, including security issues, but I do whatever I want with the machine).
Second, I'd see what they are used to, and make it work (need FORTRAN? We'll set it up. Need C++? No problem. Mathematica? OK, we'll buy a license). Then see how many of them would like to learn "additional tools" to help them be more productive (don't say "different" tools. Say "additional", so it won't seem like anyone will "lose" or "let go" or whatever). Start with editors, see if there are groups who would like to use VCS to sync their work (hey, you can stay home and send your code through SVN or GIT -- wouldn't that be great?) and so on.
Don't impose -- show examples of how cool these tools are. Make data analysis using R, and show them how easy it was. Show nice graphics, and explain how you've created them (but start with simple examples, so you can quickly explain them).
I would suggest F# as a potential candidate for performing science-related manipulations given its strong semantic ties to mathematical constructs.
Also, its support for units-of-measure, as written about here makes a lot of sense for ensuring proper translation between mathematical model and implementation source code.
First of all, I would definitely go with a scripting language to avoid having to explain a lot of extra things (for example manual memory management is - mostly - ok if you are writing low-level, performance sensitive stuff, but for somebody who just wants to use a computer as an upgraded scientific calculator it's definitely overkill). Also, look around if there is something specific for your domain (as is R for statistics). This has the advantage of already working with the concepts the users are familiar with and having specialized code for specific situations (for example calculating standard deviations, applying statistical tests, etc in the case of R).
If you wish to use a more generic scripting language, I would go with Python. Two things it has going for it are:
The interactive shell where you can experiment
Its clear (although sometimes lengthy) syntax
As an added advantage, it has libraries for most of the things you would want to do with it.
I'm no expert in this area, but I've always understood that this is what MATLAB was created for. There is a way to integrate MATLAB with SVN for source control as well.