Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I've been wanting to get into a pure-OO language for a while now, but I'm put off by the fact that they all seem to demand an IDE and I can't find any good tutorials that aren't in video format.
I'm happy to use an IDE later, but I don't want to learn the language through one. What I'm looking for is a simple console interpreter or command-line compiler such as gcc, ghc, ghci and the python IDLE (yes, it's an IDE, but it's so minimalist that it may as well just be a commandline interpretter). I find that I learn a language faster, better and more comprehensively when I'm not trying to grapple with an IDE at the same time. So please, don't tell me that squeak is the only way to do it :P
I'm also looking for tutorials that are presented textually rather than visually. Again, I learn faster when I can stare at a page and read someone's sentance over and over tossing and turning it in my mind rather than having to pause a video, take it back 10 seconds, press play, do it again, and again, and again.
I'm interested in various languages with various degrees of OO-purity, and I plan to learn them all at some point. Any of the smalltalk dialects interest me, Self (an extreme prototype-oriented version of smalltalk (Very interesting, the more radical the better imo)), strongtalk, vanilla smalltalk (or some implementation which is as vanilla as you can get).
I'm interested in Eiffel as well, the code snippets I've seen make it seem very elegant and I've read that it actually was very innovative (introduced code-contracts and other such things). However I would give preference to a language from the smalltalk camp over one from the Eiffel side because Eiffel at face value seems to be a hybrid between OO and imperative programming. Similarly I'd rather avoid Scala (Hybrid OO and functional) and other hybrid languages. So no C#, Java, C++, D, python etc etc etc. I'm not dismissing these languages because I believe they are bad, it's just that I'm setting out to learn pure-OO and those languages are hybrid OO: Not really what I'm looking for.
Also, would anyone be able to recommend the official books? For smalltalk there's the "Blue book" AKA "Smalltalk-80: The Language and its Implementation". And for Eiffel there's "Eiffel: The Language". I ask because in my experience you can pick up so much by reading books written by the author of the language (see K&R the C programming language), and by reading books in general.
So yes, my questions: What pure-OO language would be good to start off with? How would I go about learning it without having to use an IDE? And is there an associated book written by the language author(s)?
It is not helpful to learn Smalltalk as just another language. You would be missing the point entirely.
Smalltalk's graphical environment is not just an IDE. The core of the system is simply objects. The interface provides various ways to create objects and interact with them. The language is just a convenient way to create messages to the objects. It is secondary to the objects themselves.
In other OO languages, you write your program, then you run it, which creates objects in memory. Not so in Smalltalk. You create objects in memory (e.g. class objects) and then send messages to e.g. add methods. But a class object is only created once, not every time you "run your program".
There is no such thing as "your program", in fact. There is no "main". It's just a world of objects, some longer-lived, some temporary. In fact, in the system there are objects that were created 30 years ago. Literally. The objects are just frozen to disk as a memory dump (a file which we call "image") and unfrozen later (possibly on a different machine).
That image, the world of objects, is the primary artifact in Smalltalk. There is a sources file, yes, but that's just a database of text snippets to not take up so much RAM. You cannot edit this file by hand (objects in the image use absolute file offsets into the sources file). You cannot re-create the system from the sources file - the system was bootstrapped a long time ago and from then on only modified.
It's true that superficially the Smalltalk GUI looks just like another IDE. No coincidence - Eclipse was originally written by Smalltalkers in Smalltalk. But there is the crucial difference that in regular IDEs you just manipulate text files. A text editor is a valid alternative for that. In Smalltalk, the GUI manipulates objects in memory. A text editor can not do that.
And as for what Smalltalk to use, I would recommend Squeak. Very friendly community, very nice environment, and subscribing to the original Smalltalk vision of creating a great personal computing environment for everyone.
As someone who has went through process of learning Smalltalk (at least to a decent degree), I can say that you are taking harder and riskier path, in a sense that some things may take much longer to clear up, or never actually do.
But, if you insist, you can download GNU Smalltalk, for which no GUI is a norm. It also contains all sources of the system written in Smalltalk in a chunk format and you can open your text editor on them and enjoy while slowly reading through the guts of the system.
You could also startup any other Smalltalk, like Pharo, and just stick with a workspace window - this is your equivalent of command line interpreter.
Pharo also includes ProfStef quick interactive tutorial on Smalltalk, which combines text instructions and evaluating Smalltalk expressions.
As for reading, there is Pharo By Example - free book that you can browse, download or buy hardcopy.
There is also a collection of free books in which I would recommend "Smalltalk-80: The Language and its Implementation" By Adele Goldberg and David Robson, if you are interested in the innards and detail of the language.
Late David N. Smith Smalltalk FAQ is also exelent resource.
So, there you go. And take advice, and give in to the Smalltalk IDE as soon as possible, since it makes understanding of Smalltalk much, much, faster.
Richard Gabriel gave a talk recently about a paradigm shift that occurred in the programming language community in the early 90s. He claims that most experts today are incapable of understanding many of the papers from the 80s. He has evidence to back this up. This was the first time he gave the talk, and he expects to give it many times, so I imagine that many parts of the talk will change. At first, he described this paradigm shift as engineering -> science, but then he described it as system -> language. I think that describing it as a shift from systems thinking to language thinking is a better description.
Richard Gabriel is a Lisp guy. (I'm a Smalltalk guy). Lisp is like Smalltalk in that there isn't a clear boundary between the language and the library that it uses. Arithmetic and control flow are in the libraries, not the language. (Well, Lisp has some in the language and some in the libraries, while Smalltalk has it all in the libraries, except that the compiler cheats and hard codes some of them, so there isn't really much difference in the end.) In Lisp, a program is an S-expression, and editing programs is editing S-expressions. In Smalltalk, a program is a collection of objects, and editing programs is editing objects. When you are programming, you are building a system, and you program with the system.
System thinking is different from language thinking. Language thinkers want a precise description of a language. They want a book that describes the whole thing, or (if they are academics) they want a formal semantics for the language. But system thinkers know that as soon as they start to use the system, it will change. They want to understand how the system works, but are prepared to look at the system itself to figure out the details.
These are two ways of thinking, and there are advantages and disadvantages of each. Smalltalk is a wonderful example of systems thinking. I think all software developers should know at least one system that exemplifies systems thinking. Lisp is good. Forth is another old example. Naturally, I think that Smalltalk is great and am happy to help people learn it but I think the importance of learning systems thinking is more important than the particular system you learn.
Unfortunately, learning a system is harder than learning a language. You have to do more than just learn the syntax, you have to learn the libraries, the patterns of naming and of coding, and usually the tools. (Which, if this is a system, are extensible.) That is one of the advantages of language thinking. But systems thinking has long-term advantages, because once you taylor the system to your needs then you can become very productive.
To lean smalltalk syntax, you need to read ONE page of text (see Syntax section on wiki http://en.wikipedia.org/wiki/Smalltalk).
Now, to learn a smalltalk libraries and how to use them, you need to use browser not the text editor, otherwise your will just waste a lot of time.
I think that it is like factor of 10 difference in time, between trying to understand some code by reading in textual format and navigating it using browser and! debugger.
In smalltalk system a living objects could tell a lot about themselves and help you learn how to use them much faster than if you look at it as a static chunks of text, because you won't grasp the idea at all.
I've been playing with Squeak Smalltalk (and its close cousins, Pharo and Cuis) for a while now. There's no better way to learn Smalltalk than by using the system already provided.
I've devised a series of short youtube tutorials ranging in length from 50 seconds to 15 minutes that show how to take advantage of Squeak's ultra-cool features within a few minutes of first starting the system.
In fact, the very first line of code demos the OOP-ness of Squeak. Squeak from the very start
Python is a pure OOP . Actually this is an easy mistake that newcomers make when they come to python.
Python like smalltalk follows the mantra "Everything is an object". So everything inside python is an object, including built-in types. The difference is that python unlike smalltalk and Java does not force OOP as it allows procedural programming. And this is the trap, it easy to assume that makes python less OOP , but being a snake, is so devlish that does not tell you that even functions are objects ;)
http://www.linuxtopia.org/online_books/programming_books/python_programming/python_ch10s04.html
Going back to smalltalk its IDE is the huge deal here, contrary what other smalltalker may believe. If you like me are heavily disappointed with how non flexible IDEs are you are going to love Squeak's IDE. The IDE goes a great deal making easy to navigate through all the libraries and making you understand what , where and why , something happens. I cant see the benefit of using a text editor. But you can, with file ins and file outs. But doing so you cripple smalltalk into becoming as efficient as other programming languages ;)
I am only studying squeak and pharo for a week now but even for me as a beginner the benefits of the IDE is obvious from the first minute.
The fact that code is fragmented into easy to digest methods, those methods grouped into protocols , protocols grouped to our familiar Classes and Classes grouped to packages. Hence the code is so well organized that I never feel lost, everything belongs somewhere, everything is just a click away, everything is inspectable, browseable , you just select right click and sends you there. And it shows you exactly the code you need rarely more than 10 lines long. This is the IDE. Why would you prefer a text editor that will expose to information that you don't need , don't care and is likely to confuse you ?
Then everything is inside a single image , not a collection of files, your code, your libraries, system libraries , even the language itself. Everything is at your grasp, waiting for you, begging you to test, modify it, use it and abuse it. You are part of the language and the language is part of your, if something does not fit your thinking, change it. This is the IDE. Why you want to go back to the disconnected way of files and folders ?
Then you are start being afraid with all this power, all this flexibility its not unlikely that you will do something that could completely destroy the language and the libraries. Its possible , mistakes can and will happen. Again the IDE jumps in offering you a hand of help, every change is stored in a local cvs system, every change is categorized, stored and monitored any time. No lousy undos and any kind of other nonsense . What you get is old , mature well tested version control. You can change back exactly what you want any time, nothing is lost, no mistake is irreversible.
And if you don't trust you hard driver , the vcs extends online to squeaksource . And does it let you at the mercy of command line ? Hell no . You are offered the simple yet efficient Monticello browser , which will make sure you install and unistall with no conflicts .
And of course you don't want your software to have bugs , do you ? Unit Testing tool is offered to make sure your code is reliable , stable and does exactly what you want how you want it. Again a beautiful yet brilliant GUI is utilized to make complicate tasks a button away.
And because none is perfect , there will be time you will come against the dreadful error. Are you left alone ? You guessed right , a tool again is offered. The debugger. You don't need to call it, you don't need to setup it , you don't even need to figure out how it works. Like all other tools, is simple in design yet sophisticated. Not only it will spot the error , not only will tell you what you did wrong , not only will navigate through back to most basic language elements that trigger the error offering a unique perspective on how exactly the language behave like nothing I have seen before, it also allows you to do live coding. Live coding is the ability to code a program while its code runs. Isn't that impressive and infinitely useful ?
Finally , maybe you are one of those people impossible to please, maybe you still find flaws , omissions and thinks you simple don't like. The IDE is written in smalltalk , smalltalk is written in smalltalk , and the IDE can edit itself and the language, there is nothing you can't change besides some very basic functionality of the language and the VM that is compiled C. And you will guess right if you think you can use all the above tools to do exactly that.
And the tools don't stop here , smalltalk might be not that popular as other languages but it has been here for a very long time and it has some very enthusiastic programmers that love to contribute. And frankly with such an amazing IDE and such a well designed language , while with other languages contributing to them might seem a challenge, in case of smalltalk the challenge is to resist the temptation not to contribute as the IDE makes it so easy.
By the time others still code you will finish your code and actually understand what have you done and why. Thats not a small thing at all . I wish Python had such a good IDE or any other language. But the only thing that comes abit close, from my experience , is Delphi. And even in the case of Delphi I still prefer squeak and pharo.
What I find annoying about other IDEs is that they are not IDES at all, they are nothing more than glorified editors, locked, non flexible , non editable (Unless you are willing to use another programming language and navigate through tons of source code) . Squeak , Pharo and all other smalltalk dialects offer a real elegant IDE offering you really useful tools. Other IDEs better take a deep a look at smalltalk and really understand what it means to be an IDE.
Saying all those good things, smalltalk is far from perfect. And I think its biggest weakness and flaw is lack of some enjoyable and useful documentation that can help beginners jump in head first. Squeak By Example as well Pharo By Example has been a big disappointment for me. They both are still two extremely important books that provide a extremely valuable insight in both platforms , but the quality of documentation is from mediocre to bad at times. The main reason is both books follow a non noob friendly approach. First they send you deep diving in the IDE , introducing you from chapter 1 , to debugger and even unit testing !!! For me this a big mistake, and even though I am far from new to programming had to struggle to follow up what was explained. Then the book itself , lets a lot of unanswered questions. For example the explanation of instance vs class variables is not enough, I would prefer several example that not only show the how but also the why . Several areas of the book are also full of gaps or just hard to follow.
My life got a lot easier when I found this link http://stephane.ducasse.free.fr/FreeBooks.html and from there I downloaded "Smalltalk by Example" which unlike the other book not only it does what it says in the title but makes no assumption on who you are and what you know. I can only highly recommend it. I read that the other books there that are offered freely are very good as well, I will certainly download and read all of them eventually.
Alot of help has been also #squeak at irc.freenode.net, people there has been answering my questions and helping me understand.
Squeak wiki, is ok but not enough, its also not very well organised, and I dont like that comments and discussions appear inside the wiki documentation. So documentation generally can be abit of a struggle for the begginer and certainly Smalltalk IS NOT AN EASY programming language to learn. I hear many smalltalkers say otherwise and I could not disagree more, when I compare smalltalk with python is like night and day. BUT ! Once understand smalltalk , it become much easier to program in it then any other programming language I have learned so far, and I have learned most of them. So in the end I think Smalltalk is a clear win , I also love the FFI library that lets you call any C library with ease, which unleashes serious power for smalltalk.
I dont think you need to learn the language first and then the IDE, its actually a very bad idea for the simple fact that the IDE helps you understand the language and its libraries and any type of code in it. Language and IDE is like brother and sister, yin and yang.
Well, if you decide to learn Eiffel a good book would be "Object-Oriented Software Construction" by Betrand Meyer (he created the Eiffel programming language).
The book provides great insight into object-oriented design using Eiffel. In my humble opinion is one of the best OO books around.
I have wonder that many big applications (e.g. social websites such as facebook) are build with many languages into its platform.
They usually start with AJAX browser support, then scale down to PHP scripting, then move towards a powrful OOP technologie such as Java or .NET, and finally a primitive language to increase performance in crucial operations such as C.
My question is how should I determinate the edge of the layers between languages. When PHP, when Java, when C and so on. And the other question is if should those languages integrate in a vertcal fashion for simplicity and maintanance, or could it be cases when you decide to program on module of your app in Java and the other in native C.
What are the context variables that push me to move to a better performance language? (e.g. concurrency issues due increase of users)
Don't tell me that PHP overlaps .NET and Java Technologies. In a starter point it does, but when the network is overload you start seeing the diferences. I mean how can I achieve Multithreading in PHP as in Java with the same performance. The thing it's hard to answer my wuestion is becasue there is not so much reading about this. You maybe find some good books covering PHP, but few telling how when and why integrate different languages.
Each language was created for different purposes, Python is strong with string operations, Perl very powerful in batch scripting, PHP a very reliable application web server, C the mother of most popular languages.
Best,
Demian.
On one end of the scale, you move to a higher performance language whenever your profiling and measurements tell you that you have a bottleneck that can't be fixed with better algorithms, data structures, or other optimisation.
At the other end, you move to a higher level language (ie. more abstraction, better libraries) whenever your management allow you to do so. ;)
I believe most teams simply use what they are best familiar with.
There are also questions of licensing that can influence the decision.
That is, if you're talking about technologies that compare to each other and solve the problem on the same level (for example ASP.NET/JSF/JSP/PHP...). But you can't compare .NET with C++ for example, they are meant to solve different problems on different abstraction levels.
My criterion for any programming language is "does it help me to get the job done or does it just get in the way?" If the latter, then it's time to move on.
From an economical point of view the answer is easy: on a regular basis just look what will be cheaper. Either continue with the current technology and maybe stretch the envelope a bit more. Or switch to something new. When you compare the two alternatives the cost of the investment already done is not important anymore since you've already spent that money/effort. You only have to look ahead: cost of licenses, education, etc.
Of course this is easier said then done, but just sitting down with a few people, thinking about it, and maybe try to come up with some numbers already helps a lot. I have seen too many projects that continued with technology that really wasn't suited for the job anymore.
Also hard numbers don't tell the whole story. There will be resistance because of unfamiliar technology, experts who are losing their status, etc.
Identify the bottleneck
Solve bottleneck
Go to 1
I'm sure you can imagine that step 2 is the one where decisions like "What programming language do we use" and "where do we put the coffee machine" come into play. That's the basic rule.
This is not 100% programming related. But I think this is somewhat useful because it is addressing a minority in the SO community.
Microcontroller programming is one of the interesting areas in programming. I saw some topic here requesting the Resources for starting / learning / discussing about PICs.
Example topic
Since I have plenty of knowledge and experiences in this area I am thinking of publishing some resources that helps a novice to learn them from the basics. It will be not just a theoretical publication and will be based on example projects. I hope to start this over a new blog + forum so the users can dynamically interact with each other. I came in to this decision because I found very small amount of Sites that a novice can start learning and work collaboratively.
What do you guys think about this? Have you ever experienced such difficulty? Do you think you can get some use of that? What are the things you like to see on the site?
I would be thankful If you are not going to close this as NPR. I just want to do some service to other microcontroller lovers :)
There are already a few such tutorials on the net (e.g. this one from SparkFun), another one might be a valuable addition, but only if it is better or different in some way.
What will you offer that is a real improvement?
Some suggestions:
Don't assume I have windows
Have some side discussion of difference between various MCU and/or supporting electronics. Discuss some of the trade offs
You'll need a pretty general tutorial to suck people in, but the real value added might be in a specialized focus after the start
Build up to something useful and/or geeky cool
A unit on component integration (i.e. I can buy a Polar style heart rate receiver, and a MCU and a USB interface. How do I get them talking to each other so I can build an exercise data logger?)
What every you do, I'm looking forward to it (just learning embedded stuff in my spare time...).
There are the excellent tutorials at www.mikrocontroller.net, but they are in German.
If you could create something similar for an English speaking community, that would be great.
Yes! The more resources out there for helping with embedded software (microcontroller programming) the better.
It can be quite daunting to start with, especially if you've only written software for PCs or similar in the past. There are lot more constraints (e.g. on RAM and code space), and a whole load of things you need to know that don't apply to non-embedded software.
As others have mentioned here, there a number of websites that cover different aspects of this; some others are OnARM, for ARM processors, the related STM32 Circle, and Jack Ganssle's articles on his website and on Embedded.com.
Though embedded systems are an enormous market (just think how many such devices there are in your house, or in your car), my impression is that there is a lot less coverage of the subject on the web - and on Stack Overflow - than for non-embedded.
So, I look forward to seeing the fruits of your labour!
Something else that's worth to take into account when targeting beginners, is to directly provide pointers to useful resources, such as suitable simulators/emulators, or even addresses/webpages where you can easily order a starter kit or even free samples of some chips.
For example, most semiconductor manufacturers provide free samples of their products, e.g. see microchip.com or atmel.com.
Ideally, an introductory course would be based on working with such a hardware simulator or emulator in the beginning, so that the project and all relevant experience may directly map onto a real device once the beginner is interested in moving his work onto a real chip, providing pointers to freely available resources, or very affordable starter kits can be very useful.
This would ensure that beginners can get started as easily and cheaply as possible.
Maybe for the different ARM7 and CortexM3...?
Here everyone asumes there is a lot of information, but it is spread all over the net and without any red line what so ever...
But if you take AVR there is quite a lot of stuff over at http://www.avrfreaks.net, and I guess that PIC has quite a lot as well.
I have written many such examples myself but they are scattered and not organised and probably rarely read (one time the folks at avrfreaks borrowed something). StackOverflow might curb this but SO could in theory be used. Ask a question about boot code for an arm whatsit, then answer your own question with example code and text on how and why it works. The SO tags would be nice in that you could do a search on "boot" "arm" "embedded" and then one on "boot" "avr" "embedded", etc and get similar example programs for different platforms.
Personally I would go more in the direction of creating an example archive of complete programs for specific microcontroller versions (in typical uses), instead of making yet another "general" tutorial. E.g. one of microcontroller x/y that enables a serial port, one that configures a few digital outputs (setting TRIS and friends), how to set up common frequency/oscillator options etc.
When I started with PIC, (very short PIC16, then PIC18 then 24F and now dspic), one of the main problems is that all the examples are either only fragments or describing very general principles.
A tutorial is no good, if it takes more skills to get the examples actually working than the tutorial teaches.
I usually couldn't find one single complete program for exactly my controller, or even for the slightly wider group (that only vary in number of pins and memory/flash).
The initial program was always the problem, but sometimes later I had the same problem (initializing a certain peripheral) all over again (e.g. the encoder) It is specially frustrating if is the first run of a new micro controller line, and you might not be 100% sure of your hardware.
Unfortunately that takes some coordination, from a forum, an user group or so, since nobody has all devices, and all variants to wire them up (e.g. different oscillator options).
I'm now a couple of months into my Smalltalk learning voyage. I was aware, from the beginning that Smalltalk has several "dialects" (perhaps "dialect" isn't the best word) but by this I mean VisualWorks, Squeak and Dolphin to mention just three. So far I have limited my foray to Visualworks and Squeak. But I've now discovered that Squeak seems to be metamorphosing (pun intended!) into several other variants e.g. Tweak, Pharo, Cobalt and Croquet.
Could somebody explain:
a) why these initiatives (Tweak, Pharo, Croquet and Cobalt) have arisen ?
b) should I take time keep abreast - bearing in mind I'm a Smalltalk neophyte?
c) How come such an unpopular language has such a vibrant set of developments happening?
d) Are there other initiatives that I should be aware of? (as a beginner not a computer researcher that is)
As far as 'How come such an unpopular language has such a vibrant set of developments happening?', I have to say that 'popularity' does not correlate with utility or productivity. A contrarian will tell you that the majority is always wrong.
When you get bitten by the Smalltalk bug, you tend to stay bitten. There are many former Smalltalkers who are earning their living working in other languages that miss the language and would jump at the opportunity to earn their living in Smalltalk again.
This phenomenon accounts for the vibrant community.
Personally, I find that I am at my most productive working in Smalltalk. The tools and the language work together to make the gap between idea and execution very small. In Smalltalk when I am faced with using an new library, I can use the debugger to 'parachute' into the middle of the action - viewing state and code in a single tool. You can't duplicate that experience by reading code and studying log files...
Smalltalk has its quirks and the quirks do keep Smalltalk out of the mainstream. But some of the quirks are what make Smalltalk a productive environment to work in, which may mean that it will never be mainstream.
But with a vibrant and active community supporting Smalltalk (in a variety of dialects) does it matter whether Smalltalk is mainstream or not?
A bit of background might be helpful: Tweak was a research effort trying to bring some of the great things from Etoys to the system level (i.e., the player-costume architecture, the concurrency model, "events everywhere", asynchronous notifications etc). Tweak was a "blue-plane" approach to graphics, composition and scripting and in some ways never really intended to be a production tool. That it became one was its downfall because it wasn't polished enough for wide use and by becoming a production tool it became infeasible to implement some of the radical changes that would have been required to make it ready for world dominance ;-)
Croquet had an entirely different goal. We needed Croquet because we needed a bit-identical replicated computation machinery. Croquet computes bit-identically on all platforms which required modifications to the virtual machine and some libraries (such as floating point). Cobalt is a spin-off from Croquet which takes the SDK and builds an application from it. In this sense Cobalt is not really a fork - it is the current focus of the Croquet community.
I don't know about the other initiatives you mention, but Pharo is a fork which aims at producing a version of Squeak without all the cruft (like EToys, for example), better developer support and use of modern (??) technologies like TrueType fonts. It's well worth downloading the current image and having a look - I find it a bit slow on my ancient hardwate, but I intend to keep an eye on it.
This just shows how an inspiring language Smalltalk is and how sound and cleverly designed roots it has. It inspires people from academia to industry to try to extend and build new "dialects", which are then usually merged to some extend among themselves so that at the end we all profit.
That's why I like Smalltalk and its community/communities, even that sometimes you feel tensions there. But every progress needs a tension first.
Pharo is a result of such tension for instance. Pharo is a fork of Squeak, by group of Squeakers with a strong leadership and work more/talk less mentality, which already show the results and it will for sure move Squeak if not all Smalltalk a step further.
I think there are that initiatives or forks because the community is able to do it :) This small smalltalk community is stuffed up with smart guys that know what they do. There is enough knowledge about virtual machines, language design and such.
On the other hand it is like every other community, too. There are people with different opinions. So it is only a matter of time until a few people start "something slightly different" to check/realize their ideas. And they do because they can.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
Background
Last year, I did an internship in a physics research group at a university. In this group, we mostly used LabVIEW to write programs for controlling our setups, doing data acquisition and analyzing our data. For the first two purposes, that works quite OK, but for data analysis, it's a real pain. On top of that, everyone was mostly self-taught, so code that was written was generally quite a mess (no wonder that every PhD quickly decided to rewrite everything from scratch). Version control was unknown, and impossible to set up because of strict software and network regulations from the IT department.
Now, things actually worked out surprisingly OK, but how do people in the natural sciences do their software development?
Questions
Some concrete questions:
What languages/environments have you used for developing scientific software, especially data analysis? What libraries? (for example, what do you use for plotting?)
Was there any training for people without any significant background in programming?
Did you have anything like version control, and bug tracking?
How would you go about trying to create a decent environment for programming, without getting too much in the way of the individual scientists (especially physicists are stubborn people!)
Summary of answers thus far
The answers (or my interpretation of them) thus far: (2008-10-11)
Languages/packages that seem to be the most widely used:
LabVIEW
Python
with SciPy, NumPy, PyLab, etc. (See also Brandon's reply for downloads and links)
C/C++
MATLAB
Version control is used by nearly all respondents; bug tracking and other processes are much less common.
The Software Carpentry course is a good way to teach programming and development techniques to scientists.
How to improve things?
Don't force people to follow strict protocols.
Set up an environment yourself, and show the benefits to others. Help them to start working with version control, bug tracking, etc. themselves.
Reviewing other people's code can help, but be aware that not everyone may appreciate that.
What languages/environments have you used for developing scientific software, esp. data analysis? What libraries? (E.g., what do you use for plotting?)
I used to work for Enthought, the primary corporate sponsor of SciPy. We collaborated with scientists from the companies that contracted Enthought for custom software development. Python/SciPy seemed to be a comfortable environment for scientists. It's much less intimidating to get started with than say C++ or Java if you're a scientist without a software background.
The Enthought Python Distribution comes with all the scientific computing libraries including analysis, plotting, 3D visualation, etc.
Was there any training for people without any significant background in programming?
Enthought does offer SciPy training and the SciPy community is pretty good about answering questions on the mailing lists.
Did you have anything like version control, bug tracking?
Yes, and yes (Subversion and Trac). Since we were working collaboratively with the scientists (and typically remotely from them), version control and bug tracking were essential. It took some coaching to get some scientists to internalize the benefits of version control.
How would you go about trying to create a decent environment for programming, without getting too much in the way of the individual scientists (esp. physicists are stubborn people!)
Make sure they are familiarized with the tool chain. It takes an investment up front, but it will make them feel less inclined to reject it in favor of something more familiar (Excel). When the tools fail them (and they will), make sure they have a place to go for help — mailing lists, user groups, other scientists and software developers in the organization. The more help there is to get them back to doing physics the better.
The course Software Carpentry is aimed specifically at people doing scientific computing and aims to teach the basics and lessons of software engineering, and how best to apply them to projects.
It covers topics like version control, debugging, testing, scripting and various other issues.
I've listened to about 8 or 9 of the lectures and think it is to be highly recommended.
Edit: The MP3s of the lectures are available as well.
Nuclear/particle physics here.
Major programing work used to be done mostly in Fortran using CERNLIB (PAW, MINUIT, ...) and GEANT3, recently it has mostly been done in C++ with ROOT and Geant4. There are a number of other libraries and tools in specialized use, and LabVIEW sees some use here and there.
Data acquisition in my end of this business has often meant fairly low level work. Often in C, sometimes even in assembly, but this is dying out as the hardware gets more capable. On the other hand, many of the boards are now built with FPGAs which need gate twiddling...
One-offs, graphical interfaces, etc. use almost anything (Tcl/Tk used to be big, and I've been seeing more Perl/Tk and Python/Tk lately) including a number of packages that exist mostly inside the particle physics community.
Many people writing code have little or no formal training, and process is transmitted very unevenly by oral tradition, but most of the software group leaders take process seriously and read as much as necessary to make up their deficiencies in this area.
Version control for the main tools is ubiquitous. But many individual programmers neglect it for their smaller tasks. Formal bug tracking tools are less common, as are nightly builds, unit testing, and regression tests.
To improve things:
Get on the good side of the local software leaders
Implement the process you want to use in your own area, and encourage those you let in to use it too.
Wait. Physicists are empirical people. If it helps, they will (eventually!) notice.
One more suggestion for improving things.
Put a little time in to helping anyone you work directly with. Review their code. Tell them about algorithmic complexity/code generation/DRY or whatever basic thing they never learned because some professor threw a Fortran book at them once and said "make it work". Indoctrinate them on process issues. They are smart people, and they will learn if you give them a chance.
This might be slightly tangential, but hopefully relevant.
I used to work for National Instruments, R&D, where I wrote software for NI RF & Communication toolkits. We used LabVIEW quite a bit, and here are the practices we followed:
Source control. NI uses Perforce. We did the regular thing - dev/trunk branches, continuous integration, the works.
We wrote automated test suites.
We had a few people who came in with a background in signal processing and communication. We used to have regular code reviews, and best practices documents to make sure their code was up to the mark.
Despite the code reviews, there were a few occasions when "software guys", like me had to rewrite some of this code for efficiency.
I know exactly what you mean about stubborn people! We had folks who used to think that pointing out a potential performance improvement in their code was a direct personal insult! It goes without saying that that this calls for good management. I thought the best way to deal with these folks is to go slowly, not press to hard for changes and if necessary be prepared to do the dirty work. [Example: write a test suite for their code].
I'm not exactly a 'natural' scientist (I study transportation) but am an academic who writes a lot of my own software for data analysis. I try to write as much as I can in Python, but sometimes I'm forced to use other languages when I'm working on extending or customizing an existing software tool. There is very little programming training in my field. Most folks are either self-taught, or learned their programming skills from classes taken previously or outside the discipline.
I'm a big fan of version control. I used Vault running on my home server for all the code for my dissertation. Right now I'm trying to get the department to set up a Subversion server, but my guess is I will be the only one who uses it, at least at first. I've played around a bit with FogBugs, but unlike version control, I don't think that's nearly as useful for a one-man team.
As for encouraging others to use version control and the like, that's really the problem I'm facing now. I'm planning on forcing my grad students to use it on research projects they're doing for me, and encouraging them to use it for their own research. If I teach a class involving programming, I'll probably force the students to use version control there too (grading them on what's in the repository). As far as my colleagues and their grad students go, all I can really do is make a server available and rely on gentle persuasion and setting a good example. Frankly, at this point I think it's more important to get them doing regular backups than get them on source control (some folks are carrying around the only copy of their research data on USB flash drives).
1.) Scripting languages are popular these days for most things due to better hardware. Perl/Python/Lisp are prevalent for lightweight applications (automation, light computation); I see a lot of Perl at my work (computational EM) since we like Unix/Linux. For performance stuff, C/C++/Fortran are typically used. For parallel computing, well, we usually manually parallelize runs in EM as opposed to having a program implicitly do it (ie split up the jobs by look angle when computing radar cross sections).
2.) We just kind of throw people into the mix here. A lot of the code we have is very messy, but scientists are typically a scatterbrained bunch that don't mind that sort of thing. Not ideal, but we have things to deliver and we're severely understaffed. We're slowly getting better.
3.) We use SVN; however, we do not have bug tracking software. About as good as it gets for us is a txt file that tells you where bugs specific bugs are.
4.) My suggestion for implementing best practices for scientists: do it slowly. As scientists, we typically don't ship products. No one in science makes a name for himself by having clean, maintainable code. They get recognition from the results of that code, typically. They need to see justification for spending time on learning software practices. Slowly introduce new concepts and try to get them to follow; they're scientists, so after their own empirical evidence confirms the usefulness of things like version control, they will begin to use it all the time!
I'd highly recommend reading "What Every Computer Scientist Should Know About Floating-Point Arithmetic". A lot of problems I encounter on a regular basis come from issues with floating point programming.
I am a physicist working in the field of condensed matter physics, building classical and quantum models.
Languages:
C++ -- very versatile: can be used for anything, good speed, but it can be a bit inconvenient when it comes to MPI
Octave -- good for some supplementary calculations, very convenient and productive
Libraries:
Armadillo/Blitz++ -- fast array/matrix/cube abstractions for C++
Eigen/Armadillo -- linear algebra
GSL -- to use with C
LAPACK/BLAS/ATLAS -- extremely big and fast, but less convenient (and written in FORTRAN)
Graphics:
GNUPlot -- it has very clean and neat output, but not that productive sometimes
Origin -- very convenient for plotting
Development tools:
Vim + plugins -- it works great for me
GDB -- a great debugging tool when working with C/C++
Code::Blocks -- I used it for some time and found it quite comfortable, but Vim is still better in my opinion.
I work as a physicist in a UK university.
Perhaps I should emphasise that different areas of research have different emphasis on programming. Particle physicists (like dmckee) do computational modelling almost exclusively and may collaborate on large software projects, whereas people in fields like my own (condensed matter) write code relatively infrequently. I suspect most scientists fall into the latter camp. I would say coding skills are usually seen as useful in physics, but not essential, much like physics/maths skills are seen as useful for programmers but not essential. With this in mind...
What languages/environments have you used for developing scientific software, esp. data analysis? What libraries? (E.g., what do you use for plotting?)
Commonly data analysis and plotting is done using generic data analysis packages such as IGOR Pro, ORIGIN, Kaleidegraph which can be thought of as 'Excel plus'. These packages typically have a scripting language that can be used to automate. More specialist analysis may have a dedicated utility for the job that generally will have been written a long time ago, no-one has the source for and is pretty buggy. Some more techie types might use the languages that have been mentioned (Python, R, MatLab with Gnuplot for plotting).
Control software is commonly done in LabVIEW, although we actually use Delphi which is somewhat unusual.
Was there any training for people without any significant background in programming?
I've been to seminars on grid computing, 3D visualisation, learning Boost etc. given by both universities I've been at. As an undergraduate we were taught VBA for Excel and MatLab but C/MatLab/LabVIEW is more common.
Did you have anything like version control, bug tracking?
No, although people do have personal development setups. Our code base is in a shared folder on a 'server' which is kept current with a synching tool.
How would you go about trying to create a decent environment for programming, without getting too much in the way of the individual scientists (esp. physicists are stubborn people!)
One step at a time! I am trying to replace the shared folder with something a bit more solid, perhaps finding a SVN client which mimics the current synching tools behaviour would help.
I'd say though on the whole, for most natural science projects, time is generally better spent doing research!
Ex-academic physicist and now industrial physicist UK here:
What languages/environments have you used for developing scientific software, esp. data analysis? What libraries? (E.g., what do you use for plotting?)
I mainly use MATLAB these days (easy to access visualisation functions and maths). I used to use Fortran a lot and IDL. I have used C (but I'm more a reader than a writer of C), Excel macros (ugly and confusing). I'm currently needing to be able to read Java and C++ (but I can't really program in them) and I've hacked Python as well. For my own entertainment I'm now doing some programming in C# (mainly to get portability / low cost / pretty interfaces). I can write Fortran with pretty much any language I'm presented with ;-)
Was there any training for people without any significant background in programming?
Most (all?) undergraduate physics course will have a small programming course usually on C, Fortran or MATLAB but it's the real basics. I'd really like to have had some training in software engineering at some point (revision control / testing / designing medium scale systems)
Did you have anything like version control, bug tracking?
I started using Subversion / TortoiseSVN relatively recently. Groups I've worked with in the past have used revision control. I don't know any academic group which uses formal bug tracking software. I still don't use any sort of systematic testing.
How would you go about trying to create a decent environment for programming, without getting too much in the way of the individual scientists (esp. physicists are stubborn people!)
I would try to introduce some software engineering ideas at undergraduate level and then reinforce them by practice at graduate level, also provide pointers to resources like the Software Carpentry course mentioned above.
I'd expect that a significant fraction of academic physicists will be writing software (not necessarily all though) and they are in dire need of at least an introduction to ideas in software engineering.
What languages/environments have you used for developing scientific software, esp. data analysis? What libraries? (E.g., what do you use for plotting?)
Python, NumPy and pylab (plotting).
Was there any training for people without any significant background in programming?
No, but I was working in a multimedia research lab, so almost everybody had a computer science background.
Did you have anything like version control, bug tracking?
Yes, Subversion for version control, Trac for bug tracing and wiki. You can get free bug tracker/version control hosting from http://www.assembla.com/ if their TOS fits your project.
How would you go about trying to create a decent environment for programming, without getting too much in the way of the individual scientists (esp. physicists are stubborn people!).
Make sure the infrastructure is set up and well maintained and try to sell the benefits of source control.
I'm a statistician at a university in the UK. Generally people here use R for data analysis, it's fairly easy to learn if you know C/Perl. Its real power is in the way you can import and modify data interactively. It's very easy to take a number of say CSV (or Excel) files and merge them, create new columns based on others and then throw that into a GLM, GAM or some other model. Plotting is trivial too and doesn't require knowledge of a whole new language (like PGPLOT or GNUPLOT.) Of course, you also have the advantage of having a bunch of built-in features (from simple things like mean, standard deviation etc all the way to neural networks, splines and GL plotting.)
Having said this, there are a couple of issues. With very large datasets R can become very slow (I've only really seen this with >50,000x30 datasets) and since it's interpreted you don't get the advantage of Fortran/C in this respect. But, you can (very easily) get R to call C and Fortran shared libraries (either from something like netlib or ones you've written yourself.) So, a usual workflow would be to:
Work out what to do.
Prototype the code in R.
Run some preliminary analyses.
Re-write the slow code into C or Fortran and call that from R.
Which works very well for me.
I'm one of the only people in my department (of >100 people) using version control (in my case using git with githuib.com.) This is rather worrying, but they just don't seem to be keen on trying it out and are content with passing zip files around (yuck.)
My suggestion would be to continue using LabView for the acquisition (and perhaps trying to get your co-workers to agree on a toolset for acquisition and making is available for all) and then move to exporting the data into a CSV (or similar) and doing the analysis in R. There's really very little point in re-inventing the wheel in this respect.
What languages/environments have you used for developing scientific software, esp. data analysis? What libraries? (E.g., what do you use for plotting?)
My undergraduate physics department taught LabVIEW classes and used it extensively in its research projects.
The other alternative is MATLAB, in which I have no experience. There are camps for either product; each has its own advantages/disadvantages. Depending on what kind of problems you need to solve, one package may be more preferable than the other.
Regarding data analysis, you can use whatever kind of number cruncher you want. Ideally, you can do the hard calculations in language X and format the output to plot nicely in Excel, Mathcad, Mathematica, or whatever the flavor du jour plotting system is. Don't expect standardization here.
Did you have anything like version control, bug tracking?
Looking back, we didn't, and it would have been easier for us all if we did. Nothing like breaking everything and struggling for hours to fix it!
Definitely use source control for any common code. Encourage individuals to write their code in a manner that could be made more generic. This is really just coding best practices. Really, you should have them teaching (or taking) a computer science class so they can get the basics.
How would you go about trying to create a decent environment for programming, without getting too much in the way of the individual scientists (esp. physicists are stubborn people!)
There is a clear split between data aquisition (DAQ) and data analysis. Meaning, it's possible to standardize on the DAQ and then allow the scientists to play with the data in the program of their choice.
Another good option is Scilab. It has graphic modules à la LabVIEW, it has its own programming language and you can also embed Fortran and C code, for example. It's being used in public and private sectors, including big industrial companies. And it's free.
About versioning, some prefer Mercurial, as it gives more liberties managing and defining the repositories. I have no experience with it, however.
For plotting I use Matplotlib. I will soon have to make animations, and I've seen good results using MEncoder. Here is an example including an audio track.
Finally, I suggest going modular, this is, trying to keep main pieces of code in different files, so code revision, understanding, maintenance and improvement will be easier. I have written, for example, a Python module for file integrity testing, another for image processing sequences, etc.
You should also consider developing with the use a debugger that allows you to check variable contents at settable breakpoints in the code, instead using print lines.
I have used Eclipse for Python and Fortran developing (although I got a false bug compiling a Fortran short program with it, but it may have been a bad configuration) and I'm starting to use the Eric IDE for Python. It allows you to debug, manage versioning with SVN, it has an embedded console, it can do refactoring with Bicycle Repair Man (it can use another one, too), you have Unittest, etc. A lighter alternative for Python is IDLE, included with Python since version 2.3.
As a few hints, I also suggest:
Not using single-character variables. When you want to search appearances, you will get results everywhere. Some argue that a decent IDE makes this easier, but then you will depend on having permanent access to the IDE. Even using ii, jj and kk can be enough, although this choice will depend on your language. (Double vowels would be less useful if code comments are made in Estonian, for instance).
Commenting the code from the very beginning.
For critical applications sometimes it's better to rely on older language/compiler versions (major releases), more stable and better debugged.
Of course you can have more optimized code in later versions, fixed bugs, etc, but I'm talking about using Fortran 95 instead of 2003, Python 2.5.4 instead of 3.0, or so. (Specially when a new version breaks backwards compatibility.) Lots of improvements usually introduce lots of bugs. Still, this will depend on specific application cases!
Note that this is a personal choice, many people could argue against this.
Use redundant and automated backup! (With versioning control).
Definitely, use Subversion to keep current, work-in-progress, and stable snapshot copies of source code. This includes C++, Java etc. for homegrown software tools, and quickie scripts for one-off processing.
With the strong leaning in science and applied engineering toward "lone cowboy" development methodology, the usual practice of organizing the repository into trunk, tag and whatever else it was - don't bother! Scientists and their lab technicians like to twirl knobs, wiggle electrodes and chase vacuum leaks. It's enough of a job to get everyone to agree to, say Python/NumPy or follow some naming convention; forget trying to make them follow arcane software developer practices and conventions.
For source code management, centralized systems such as Subversion are superior for scientific use due to the clear single point of truth (SPOT). Logging of changes and ability to recall versions of any file, without having chase down where to find something, has huge record-keeping advantages. Tools like Git and Monotone: oh my gosh the chaos I can imagine that would follow! Having clear-cut records of just what version of hack-job scripts were used while toying with the new sensor when that Higgs boson went by or that supernova blew up, will lead to happiness.
What languages/environments have you
used for developing scientific
software, esp. data analysis? What
libraries? (E.g., what do you use for
plotting?)
Languages I have used for numerics and sicentific-related stuff:
C (slow development, too much debugging, almost impossible to write reusable code)
C++ (and I learned to hate it -- development isn't as slow as C, but can be a pain. Templates and classes were cool initially, but after a while I realized that I was fighting them all the time and finding workarounds for language design problems
Common Lisp, which was OK, but not widely used fo Sci computing. Not easy to integrate with C (if compared to other languages), but works
Scheme. This one became my personal choice.
My editor is Emacs, although I do use vim for quick stuff like editing configuration files.
For plotting, I usually generate a text file and feed it into gnuplot.
For data analysis, I usually generate a text file and use GNU R.
I see lots of people here using FORTRAN (mostly 77, but some 90), lots of Java and some Python. I don't like those, so I don't use them.
Was there any training for people
without any significant background in
programming?
I think this doesn't apply to me, since I graduated in CS -- but where I work there is no formal training, but people (Engineers, Physicists, Mathematicians) do help each other.
Did you have anything like version
control, bug tracking?
Version control is absolutely important! I keep my code and data in three different machines, in two different sides of the world -- in Git repositories. I sync them all the time (so I have version control and backups!) I don't do bug control, although I may start doing that.
But my colleagues don't BTS or VCS at all.
How would you go about trying to
create a decent environment for
programming, without getting too much
in the way of the individual
scientists (esp. physicists are
stubborn people!)
First, I'd give them as much freedom as possible. (In the University where I work I could chooe between having someone install Ubuntu or Windows, or install my own OS -- I chose to install my own. I don't have support from them and I'm responsible for anything that happens with my machins, including security issues, but I do whatever I want with the machine).
Second, I'd see what they are used to, and make it work (need FORTRAN? We'll set it up. Need C++? No problem. Mathematica? OK, we'll buy a license). Then see how many of them would like to learn "additional tools" to help them be more productive (don't say "different" tools. Say "additional", so it won't seem like anyone will "lose" or "let go" or whatever). Start with editors, see if there are groups who would like to use VCS to sync their work (hey, you can stay home and send your code through SVN or GIT -- wouldn't that be great?) and so on.
Don't impose -- show examples of how cool these tools are. Make data analysis using R, and show them how easy it was. Show nice graphics, and explain how you've created them (but start with simple examples, so you can quickly explain them).
I would suggest F# as a potential candidate for performing science-related manipulations given its strong semantic ties to mathematical constructs.
Also, its support for units-of-measure, as written about here makes a lot of sense for ensuring proper translation between mathematical model and implementation source code.
First of all, I would definitely go with a scripting language to avoid having to explain a lot of extra things (for example manual memory management is - mostly - ok if you are writing low-level, performance sensitive stuff, but for somebody who just wants to use a computer as an upgraded scientific calculator it's definitely overkill). Also, look around if there is something specific for your domain (as is R for statistics). This has the advantage of already working with the concepts the users are familiar with and having specialized code for specific situations (for example calculating standard deviations, applying statistical tests, etc in the case of R).
If you wish to use a more generic scripting language, I would go with Python. Two things it has going for it are:
The interactive shell where you can experiment
Its clear (although sometimes lengthy) syntax
As an added advantage, it has libraries for most of the things you would want to do with it.
I'm no expert in this area, but I've always understood that this is what MATLAB was created for. There is a way to integrate MATLAB with SVN for source control as well.