Code Optimization with Scala - optimization

What structures of Scala can be used more efficiently than in Java, to increase execution speed? I don't know if this is possible, but to clear my doubts :)
Thanks

The scala #specialized annotation can generate multiple versions of a class, fine-tuned with specific primitive types. You can write all of that out in Java, but you probably don't want to.

To expand on Ross's answer, you can use #specialized to generate specific versions of a collection. For instance, in Java you'd generally use fastutil or Apache Primitives for collections of primitives. Scala's #specialized will generate these variants for you and hide them automatically like so:
class MyLinkedList[#specialized T] (args: T*) {
// whatever it does
}
Other than that, actors make it easier to write concurrent applications. Coming up in 2.9 are parallel collections, which can apply higher-order functions in parallel across collections, speeding up any place you'd have the Scala equivalent of a Java loop (fold, foreach, etc). See this ScalaDays talk for the nitty-gritty on this.

As of 2.9, the parallel collections library is slated to be part of the standard distribution. This will allow extremely simple distribution of so-called "embarrassingly parallel" problems over multiple cores. Doing so in Java takes considerably more effort.
As a general rule, Scala benchmarks range from moderately slower than Java to slightly faster, depending on the problem and coding techniques.

I'll refrain from speculation on how the resulting performance might differ from an equivalent Java construct, but Scala does closure elimination, which might make a measurable difference, modulo HotSpot tricks.
Also stay tuned for Iulian's thesis which should be out soon and will provide a lot more information on the subject of Scala optimization.

Related

OOP Performance after compilation

I got into a conversation with someone about OOP, who said that OOP costs to much performance. Now I know that in some cases it might, but as I see it, it would depend on different things.
Language execution. In languages using an interpretor, I can see that it could be a possibility. But what about compiled language like C++ or half compiled like Java? In any case it would just slow down the compilation vs. C, but as native or byte code I would think that the compilers would have optimized it to a point where this is not a problem.
Language structure. If we take PHP as an example, it is quite a flexible language with little rules. Java on the other hand uses strict naming schemes, strict file structure rules and is strict about data types. This speeds up lookup quite a bit. What if we used the same rules in PHP? Made it 100% OOP and adapted the same rules as Java has, would this not speed up PHP?
I found a really great OOP example, but this example does not prove the upside of OOP, but rather the upside of overview and structure. It's no problem using PP to do the same, at least not in PHP.
OOP is a very moot term and that's why your question is moot as well.
On the most generic level, OOP is about objects (let's not dive into what they are) encapsulating some state and passing each other messages to enquire or change that state. As you can see, these objects might be processes running on separate network-connected machines and message passing might be done quite literally—by passing messages of some application-level protocol over that network; this is the one extreme. The opposite edge of this spectrum is, say, C or C++ or Object Pascal etc which are compiled down to machine instructions and in which objects are just memory regions. I reckon the only "interesting" topic is a language on this side of the OOP spectrum, right?
In this, down-to-machine, level, the only relevant slow down I perceive is dynamic dispatch which is what is typically used to implement implementation inheritance (class Bar extends class Foo as in PHP) which allows you to pass objects of derived classes to the code expecting objects of their base class. This is typically requires a lookup through the table of methods at runtime to select a relevant method.
Note that this is not somehow inherent to the concept of OOP. For instance, dynamic lookup like this has been routinely used in plain C code even before C++ came to existence, and C is not an OOP language.
What I'm leading you to, is that some ways to access data and code cost more than others in terms of performance but provide powerful programming tools. Picking such an algorythm while considering a resulting tradeoff is not at all peculiar to implementing OOP concepts and happens in any computing field and any computing paradigm or a combinations of them.
In the end, I would say that the most visible slow-downs will come not from the code running on a CPU but rather from the runtime system. For instance, PHP is known for its ability to dynamically load code at runtime. Does this count as a feature of it being an OOP-enabled language? On the one hand, in these days of heavy frameworks, when PHP loads something it usually loads definitions of classes. On the other hand, if these frameworks were, say, purely procedural the same performance cost would be incurred (as the most loading time is spent waiting on I/O). Interpreted or JIT-compiled languages have to interpret or compile the code they execute and this incurs pefrormance hits. Does this depend on some of these languages implementing OOP concepts? Unlikely, IMO.

Anyone program "low-level" on JVM?

Sometimes we hear about brave people who understand and write assembly language for performance reasons, as opposed to using a compiler with a high-level language. Can the same be done on the JVM? I've reviewed the JVM instruction set, and it resembles assembly language in some respects, though it's much higher level (I'm assuming that the system-specific implementations of the JVM are extremely efficient).
Is it possible to, say, write JVM instructions and put them into a Java-executable binary?
Yes. You can do this via the asm library.
In fact, this is typically how people implement non-Java languages on top of the JVM, and how many Java metaprogramming libraries work.
You may very well want to do this for the same kind of metaprogramming capabilities - e.g., generating classes at runtime, or using the InvokeDynamic instruction to generate your own method dispatch rules.
There isn't a whole lot of performance benefit to be gained from using raw Java bytecode rather than writing the corresponding high-level Java (the JIT is your main performance booster, and it's optimized for the sorts of patterns "vanilla" Java code generates) but it does give you flexibility for things that are difficult, verbose, or impossible to express in Java.

OCaml modules and performance

Some functions are really easy to implement in OCaml (for example, map from a list) but you may use the map of the OCaml library: List.map
However, we can wonder which code will be more efficient. Calling a module of a separate compilation unit (a library) may void some possible optimizations. I read in the news group fa.caml that when calling functions from libraries, closures are used.
I have OCaml code in production that use Modules and Functors for doing generic programming. For historical reason my code is monolitic: all in one file. Now I have more time, I'm willing to separate the code into files for such modules. However, I'm afraid I can lost performance, as it took me a while to get it right. For example, I have modules for wrapping complex objects with numbers, so I enforce unique representation and fast comparison. I use those wrapped objects with generic Maps, Sets, and build caches upon them.
The questions are:
Am I going to loose performance if I move to separate files?
Is OCaml doing many optimizations on my code full of modules, functors, etc?
In C++, if you define class method in a .h, the compiler may end up inlining short methods, etc. Is it possible to achieve that in OCaml using separated files?
You may lose some performance. However, there are two mitigating factors:
The OCaml native code compiler can do cross-module inlining, so it is possible for code to be inlined even across the separate compilation units (with a couple caveats - recursive functions and function arguments are not inlined across modules[1]).
The code will still quite possibly be fast enough, and the gains in readability and maintainability will quite possibly outweigh any (marginal) performance cost.
I do not know if OCaml defunctorizes code where the functors are defined in the same source file. If it does not, then modules shouldn't add any performance hit above that already incurred by the functors.
In general, it is my opinion that it is best to write straightforward, readable, maintainable code and not worry too much about microscopic performance characteristics like this unless the code proves to be too slow in practice.

OOP vs Functional Programming vs Procedural [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
What are the differences between these programming paradigms, and are they better suited to particular problems or do any use-cases favour one over the others?
Architecture examples appreciated!
All of them are good in their own ways - They're simply different approaches to the same problems.
In a purely procedural style, data tends to be highly decoupled from the functions that operate on it.
In an object oriented style, data tends to carry with it a collection of functions.
In a functional style, data and functions tend toward having more in common with each other (as in Lisp and Scheme) while offering more flexibility in terms of how functions are actually used. Algorithms tend also to be defined in terms of recursion and composition rather than loops and iteration.
Of course, the language itself only influences which style is preferred. Even in a pure-functional language like Haskell, you can write in a procedural style (though that is highly discouraged), and even in a procedural language like C, you can program in an object-oriented style (such as in the GTK+ and EFL APIs).
To be clear, the "advantage" of each paradigm is simply in the modeling of your algorithms and data structures. If, for example, your algorithm involves lists and trees, a functional algorithm may be the most sensible. Or, if, for example, your data is highly structured, it may make more sense to compose it as objects if that is the native paradigm of your language - or, it could just as easily be written as a functional abstraction of monads, which is the native paradigm of languages like Haskell or ML.
The choice of which you use is simply what makes more sense for your project and the abstractions your language supports.
I think the available libraries, tools, examples, and communities completely trumps the paradigm these days. For example, ML (or whatever) might be the ultimate all-purpose programming language but if you can't get any good libraries for what you are doing you're screwed.
For example, if you're making a video game, there are more good code examples and SDKs in C++, so you're probably better off with that. For a small web application, there are some great Python, PHP, and Ruby frameworks that'll get you off and running very quickly. Java is a great choice for larger projects because of the compile-time checking and enterprise libraries and platforms.
It used to be the case that the standard libraries for different languages were pretty small and easily replicated - C, C++, Assembler, ML, LISP, etc.. came with the basics, but tended to chicken out when it came to standardizing on things like network communications, encryption, graphics, data file formats (including XML), even basic data structures like balanced trees and hashtables were left out!
Modern languages like Python, PHP, Ruby, and Java now come with a far more decent standard library and have many good third party libraries you can easily use, thanks in great part to their adoption of namespaces to keep libraries from colliding with one another, and garbage collection to standardize the memory management schemes of the libraries.
These paradigms don't have to be mutually exclusive. If you look at python, it supports functions and classes, but at the same time, everything is an object, including functions. You can mix and match functional/oop/procedural style all in one piece of code.
What I mean is, in functional languages (at least in Haskell, the only one I studied) there are no statements! functions are only allowed one expression inside them!! BUT, functions are first-class citizens, you can pass them around as parameters, along with a bunch of other abilities. They can do powerful things with few lines of code.
While in a procedural language like C, the only way you can pass functions around is by using function pointers, and that alone doesn't enable many powerful tasks.
In python, a function is a first-class citizen, but it can contain arbitrary number of statements. So you can have a function that contains procedural code, but you can pass it around just like functional languages.
Same goes for OOP. A language like Java doesn't allow you to write procedures/functions outside of a class. The only way to pass a function around is to wrap it in an object that implements that function, and then pass that object around.
In Python, you don't have this restriction.
For GUI I'd say that the Object-Oriented Paradigma is very well suited. The Window is an Object, the Textboxes are Objects, and the Okay-Button is one too. On the other Hand stuff like String Processing can be done with much less overhead and therefore more straightforward with simple procedural paradigma.
I don't think it is a question of the language neither. You can write functional, procedural or object-oriented in almost any popular language, although it might be some additional effort in some.
In order to answer your question, we need two elements:
Understanding of the characteristics of different architecture styles/patterns.
Understanding of the characteristics of different programming paradigms.
A list of software architecture styles/pattern is shown on the software architecture article on Wikipeida. And you can research on them easily on the web.
In short and general, Procedural is good for a model that follows a procedure, OOP is good for design, and Functional is good for high level programming.
I think you should try reading the history on each paradigm and see why people create it and you can understand them easily.
After understanding them both, you can link the items of architecture styles/patterns to programming paradigms.
I think that they are often not "versus", but you can combine them. I also think that oftentimes, the words you mention are just buzzwords. There are few people who actually know what "object-oriented" means, even if they are the fiercest evangelists of it.
One of my friends is writing a graphics app using NVIDIA CUDA. Application fits in very nicely with OOP paradigm and the problem can be decomposed into modules neatly. However, to use CUDA you need to use C, which doesn't support inheritance. Therefore, you need to be clever.
a) You devise a clever system which will emulate inheritance to a certain extent. It can be done!
i) You can use a hook system, which expects every child C of parent P to have a certain override for function F. You can make children register their overrides, which will be stored and called when required.
ii) You can use struct memory alignment feature to cast children into parents.
This can be neat but it's not easy to come up with future-proof, reliable solution. You will spend lots of time designing the system and there is no guarantee that you won't run into problems half-way through the project. Implementing multiple inheritance is even harder, if not almost impossible.
b) You can use consistent naming policy and use divide and conquer approach to create a program. It won't have any inheritance but because your functions are small, easy-to-understand and consistently formatted you don't need it. The amount of code you need to write goes up, it's very hard to stay focused and not succumb to easy solutions (hacks). However, this ninja way of coding is the C way of coding. Staying in balance between low-level freedom and writing good code. Good way to achieve this is to write prototypes using a functional language. For example, Haskell is extremely good for prototyping algorithms.
I tend towards approach b. I wrote a possible solution using approach a, and I will be honest, it felt very unnatural using that code.

Which scripting language to support in an existing codebase?

I'm looking at adding scripting functionality to an existing codebase and am weighing up the pros/cons of various packages. Lua is probably the most obvious choice, but I was wondering if people have any other suggestions based on their experience.
Scripts will be triggered upon certain events and may stay resident for a period of time. For example upon startup a script may define several options which the program presents to the user as a number of buttons. Upon selecting one of these buttons the program will notify the script where further events may occur.
These are the only real requirements;
Must be a cross-platform library that is compilable from source
Scripts must be able to call registered code-side functions
Code must be able to call script-side functions
Be used within a C/C++ codebase.
Based on my own experience:
Python. IMHO this is a good choice. We have a pretty big code base with a lot of users and they like it a lot.
Ruby. There are some really nice apps such as Google Sketchup that use this. I wrote a Sketchup plugin and thought it was pretty nice.
Tcl. This is the old-school embeddable scripting language of choice, but it doesn't have a lot of momentum these days. It's high quality though, they use it on the Hubble Space Telescope!
Lua. I've only done baby stuff with it but IIRC it only has a floating point numeric type, so make sure that's not a problem for the data you will be working with.
We're lucky to be living in the golden age of scripting, so it's hard to make a bad choice if you choose from any of the popular ones.
I have played around a little bit with Spidermonkey. It seems like it would at least be worth a look at in your situation. I have heard good things about Lua as well. The big argument for using a javascript scripting language is that a lot of developers know it already and would probably be more comfortable from the get go, whereas Lua most likely would have a bit of a learning curve.
I'm not completely positive but I think that spidermonkey your 4 requirements.
I've used Python extensively for this purpose and have never regretted it.
Lua is has the most straight-forward C API for binding into a code base that I've ever used. In fact, I usually quickly roll bindings for it by hand. Whereas, you often wouldn't consider doing so without a generator like swig for others. Also, it's typically faster and more light weight than the alternatives, and coroutines are a very useful feature that few other languages provide.
AngelScript
lets you call standard C functions and C++ methods with no need for proxy functions. The application simply registers the functions, objects, and methods that the scripts should be able to work with and nothing more has to be done with your code. The same functions used by the application internally can also be used by the scripting engine, which eliminates the need to duplicate functionality.
For the script writer the scripting language follows the widely known syntax of C/C++ (with minor changes), but without the need to worry about pointers and memory leaks.
The original question described Tcl to a "T".
Tcl was designed from the beginning to be an embedded scripting language. It has evolved to be a first class dynamic language in its own right but still is used all over the world as an embeded language. It is available under the BSD license so it is just about as free as it gets. It also compiles on pretty much any moden platform, and many not-so-modern. And not only does it work on desktop systems, there are variations available for mobile platforms.
Tcl excels as a "glue" language, where you can write performance-intensive functions in C while still benefiting from the advantages of a scripting language for less performance critical parts of the application.
Tcl also comes with a first class GUI toolkit (Tk) that is arguably one of the easiest cross platform GUI toolkits available. It also interfaces very nicely with SQLite and other databases, and has had built-in support for unicode for quite some time.
If the scripting interface will be made available to your customers (as opposed to simply enabling your own engineers to work at the scripting level), Tcl is extremely easy to learn as there are a total of only 12 rules that govern the entire language (as of tcl 8.6). In fact, Tcl shines as a way to invent domain specific languages which is often how it is used as an end-user scripting solution.
There were some excellent suggestions already, but I just wanted to mention that Perl can also be called / can call to C/C++.
You probably could use any modern scripting / bytecode language.
If you're willing to put up with the growing pains of a new product, you could use the Parrot VM. Which has support for many, if not all of the languages listed on this page. Unfortunately it's not done yet, but that hasn't stopped some people from using it in a production environment.
I think most people are probably mentioning the scripting language that they are most familiar with. From my perspective, Tcl was designed specifically to interface with C, so your problem domain is tailor-made for the language. However, I'm sure Python, Perl, or Lua would be fine. You should probably choose the language that is most familiar to your current team, since that will reduce the learning time.