How to optimise Groovy code? - optimization

There are lots of guides around the internet on how to optimise Java code, but not so much for Groovy. What are the Groovy-specific things to watch out for when a piece of Groovy code needs to run faster?

Preliminary: This answer is specific to non-indy Groovy code running on the Hotspot vm (OpenJDK, Oracle JVM) version 1.8. The indy option promises to improve dynamic calls a lot, but I don't have experience with it.
Standard disclaimer on any optimisation: MEASURE FIRST! The slowest part of the code is likely somewhere you did not expect. For JVM code I have used the honest profiler, it has its limitations but for CPU-bound loads it is certainly much better than the profilers that depend on JVM checkpoints. And it is free.
Optimising Groovy code:
Apply #CompileStatic to all the hot code! Make sure you don't miss an intermediate hot method somewhere. Without CompileStatic, the JVM's optimisation capabilities are useless.
Avoid Groovy collection methods. Groovy collection methods (each, collect, etc) are slow because they use closures which cannot be inlined properly by the JVM. Replace loops with Java-style for and for-each loops. For simple loops, this is just as readable as the Groovy syntax. For more complex cases look at the native Java collection methods or libraries like Guava.
Avoid closures if you can. Closure invocations are slow. Even if all your code is #CompileStatic, closure invocations still use Groovy's dynamic dispatch logic which is slow and blocks JVM optimisations. An alternative can be plain old Java inner classes with a single method. Unfortunately this is ugly, and Groovy does not support Java's anonymous class syntax. But be aware of the trade off.
Avoid the as keyword. as does a potentially expensive conversion, which is different from just a cast. Instead of foo as Bar, write (Bar) foo if all you want is to cast. Groovy casts are still a bit more expensive than casts in Java, but they are much cheaper than conversions. Groovy casts also do a bit more than Java casts, such as converting between different numeric types. Conversions can be useful but only use them if you actually want a conversion and not just a cast, and their cost is worth the benefit in your case.
Apply the general Java optimisation techniques. They usually work just as well on Groovy as on Java.
Stuff you don't need to bother with: (This is not Groovy specific, but still applies)
Making everything final. If a local variable or parameter is not changed in practice the JVM will see that and optimise it as if it were final. Final static class members can still help in some cases, but the compiler will keep track of e.g. types just as well for a final and a non-final field.
Spending a lot of effort on avoiding object allocation. Object allocation is very cheap. The first generation copying garbage collector copies only the live objects and ignores dead ones, so there is almost no garbage collection cost to short lived objects. And in general the (current) garbage collector does most of the heavy work on a separate thread, so unless all your processor cores are saturated the garbage collection itself does not slow down your (single threaded) code appreciably. Reusing objects can introduce nasty bugs and is often ugly. You only need to look at your object creation if profiling indicates you are spending a significant amount of time on garbage collections.

Related

destructor in Kotlin programming language

I am new to Kotlin, have written a class in kotlin to perform database operation
I have defined database connection in constructor using init but I want to close database connection using destructor.
Any Idea of how to achieve this using kotlin destructor?
Currently I have wrote a separate function to close connection, which i want to have it using destructor like any other programming language like php,etc
Handling resources that need to be closed in Kotlin
You can make your Database Wrapper extend Closeable. You can then use it like this.
val result = MyResource().use { resource ->
resource.doThing();
}
This way inside the use block your resource will be available, afterwards you will get back result, which is what doThing() returns, and your resource will be closed. As you haven't stored it in a variable you also avoid accidentally using the resource after it is closed.
Why to avoid finalize
Finalise is not safe, this describes some of problems with them, such as:
They are not guaranteed to run at all
When they do run there can be delays before it runs
The link sums up the problems like this:
Finalizers are unpredictable, often dangerous, and generally unnecessary. Their use can cause erratic behavior, poor performance, and portability problems. Finalizers have a few valid uses, which we’ll cover later in this item, but as a rule of thumb, you should avoid finalizers.
C++ programmers are cautioned not to think of finalizers as Java’s analog of C++ destructors. In C++, destructors are the normal way to reclaim the resources associated with an object, a necessary counterpart to constructors. In Java, the garbage collector reclaims the storage associated with an object when it becomes unreachable, requiring no special effort on the part of the programmer. C++ destructors are also used to reclaim other nonmemory resources. In Java, the try-finally block is generally used for this purpose.
If you really need to use finalize
This link shows how to override finalize, but it is a bad idea unless absolutely necessary.

Perfomance in audio processing with objective-c ++

I'm currently developing an audio application and the performance is one of my main concerns.
There are really good articles like Four common mistakes in audio development or Real-time audio programming 101: time waits for nothing.
I understood that the c++ is the way to go for audio processing but I still have a question: Does Objective-C++ slow down the performance?
For example with a code like this
#implementation MyObjectiveC++Class
- (float*) objCMethodWithOnlyC++:(float*) input {
// Process in full c++ code here
}
#end
Will this code be less efficient than the same one in a cpp file?
Bonus question: What will happen if I use GrandCentralDispatch inside this method in order to to parallelize the process?
Calling an obj C method is slower than calling a pure c or c++ method as the obj C runtime is invoked at every call. If it matters in your case dependent on the number of samples processed in each call. If you are only processing one sample at a time, then this might be a problem. If you process a large buffer then I wouldn't worry too much.
The best thing to do is to profile it and then evaluate the results against your requirements for performance.
And for your bonus question the answer is somewhat the same. GCD comes at a cost, and if that cost is larger than what you gain by parallelising it, then it is not worth. so again it depends on the amount of work you plan to do per call.
Regards
Klaus
To simplify, in the end ObjC and C++ code goes through the same compile and optimize chain as C. So the performance characteristics of identical code inside an ObjC or C++ method are identical.
That said, calling ObjC or C++ methods has different performance characteristics. ObjC provides a dynamically modifiable, binary stable ABI with its methods. These additional guarantees (which are great if you are exposing methods via public API from e.g. a framework) come at a slight performance cost.
If you are doing lots of calls to the same method (e.g. per sample or per pixel), that tiny performance penalty may add up. The same applies to ivar access, which carries another load on top of how ivar access would be in C++ (on the modern runtime) to guarantee binary stability.
Similar considerations apply to GCD. GCD parallelizes operations, so you pay a penalty for thread switches (like with regular threads) and for each new block you dispatch. But the actual code in them runs at just the same speed as it would anywhere else. Also, different from starting your own threads, GCD will re-use threads from a pool, so you don't pay the overhead for creating new threads repeatedly.
It's a trade-off. Depending on what your code does and how it does it and how long individual operations take, either approach may be faster. You'll just have to profile and see.
The worst thing you can probably do is do things that don't need to be real-time (like updating your volume meter) in one of the real-time CoreAudio threads, or make calls that block.
PS - In most cases, performance differences will be negligible. So another aspect to focus on would be readability and easy maintenance of your code. E.g. using blocks can make code that has lots of asynchronicity easier to read because you can set up the blocks in the correct order in one method of your source file, making the flow clearer than if you split them across several methods that all just do a tiny thing and then start an asynchronous process.

Functional data-structures, OO notions of dispatched equality and comparison, StructuralEquality, and referential transparency

I have a very CPU intensive F# program that depends on persistent data-structures - about 40% of the total CPU time is spent in the Map module. So I thought I'd try out the PersistentHashMap in FSharpX collections. (BTW, this is already a big improvement over the previous version of F# in VS2013 where the same program spent 70% of its time in Map. I also notice that running programs with the debugger attached doesn't have the huge penalty it did before - good work guys...) There is also a hot-spot where I'm re-sorting all the time, where instead I should be adding to a Heap, so I thought I'd give that a go as well.
Two issue became immediately apparent:
(1) Swapping out one for the other from an interface perspective proved harder than it seems it should - I.e., making a shim that let me switch from a Map to a PersistentMap, preserving both the needed module-based let-bound functions and Types necessary to use the each map. I know that having full HM type-inference (and no type-classes) is orthogonal to LSP-style referential transparency for the most part - but maybe I was missing some way to do this better with a minimal amount of code.
(2) The biggest problem (which I'd like to focus on here) is the reliance of the F# functional data-structs on oo-style dispatched equality and comparison via the IComparison (when 't : comparison), etc., family of interfaces.
Even for OO programs ISTM that the idea of dispatching equality and comparison is a bad idea -- an object "knows" how to perform its own domain-specific tasks, but it doesn't "know" for the most part what notion of equality is going to be necessary at various points in the program for various purposes -- so equality/comparison should not be part of the object's interface, but when these concepts are needed, they should always be mentioned explicitly. For example, there should never be a .Sort(), only a .SortWith(...). One could argue that even something as basic as structural equality in F# could be explicit a.StructEq(b) or a ~= b - otherwise you always get object.Equals -- but even stipulating that doing things this way is the best for a multi-paradigm language that's a first-class .Net citizen, it seems like there should at least be the option of using passed-in comparison and equality functions, but this is not the case.
This means that: (a) type constraints are enforced even if you don't want them, causing ripples of broken inferred typing (and hundreds of wavy red lines with it being unclear where the actual "problem" is) and (b), that by implementing a notion of equality or comparison that makes one container type happy in one part of your program (and in my case I want to use the same container and item type with two different notions of ordering in two different places), it is likely to silently break (or cause inefficiency, if one subsumes the other) in other parts of the code that depended on the default/previous implementation.
The only way around this that I could think of is wrapping each item a adapter object using new...with object expression - but I really don't want to create so much garbage just to get the code to work.
So, ISTM that we could have a "pure" version of each persistent data struct that could be loaded if desired (even basics like List, etc.) that do not depend on dispatched equality/comparison/hashing and do not impose type constraints - all such needs should be via a passed in fn's at the time of the call. (Dispatched eq/cmp would be only for used for interop with BCL collections that don't accept delegates.) Then we could have a [EqCmpHashThrowNotImplemented] attribute, and I could be sure that there were no default operations happening at all, and I would feel better about the efficiency and predictability of my code. (And this also let's one change from a Record to a Class or visa-versa w/o worrying about any changes in behavior due to default implementations.) Again, this would be optional, but done by with a simple import. (Which does mean that each base core collection type would have to be broken out into its own module, which isn't really a bad idea anyway.)
If I've overlooked a better way to do things or there are some patterns people are using here, I'd be interested.

Scala immutable vs mutable. What is the way one should go?

I'm just learning to program in scala.
I have some experience in functional programming, as I have in object oriented programming.
My question is kind of simple, yet tricky:
Which structures should be used in Scala? Should we only stick to immutables, eg. modifing lists by iterating through it and stick a new one together, or go for mutables? What is your opinion on that, what are the performance aspects, memory related aspects, ...
I'm likely to program in a functional style, but it often expands to an insane amount of effort to do things which are easily done by using mutables. Is it situation dependent, what to use?
Prefer immutable to mutable state. Use mutable state only where it is absolutely necessary. Some notable reasons include:
Performance. The standard libraries make wide use of vars and while loops, even though this is not idiomatic Scala. This should not be emulated, however, except for cases where you have profiled to determine that modifying the code to be more imperative will bring a significant performance gain.
I/O. I/O, or interacting with the outside world is inherently state dependent, and thus must be dealt with in a mutable manner.
This is no different than the recommended coding style found in all major languages, imperative or functional. For example, in Java it is preferable to use data objects with only private final fields. Code written in an immutable (and functional) way is inherently easier to understand because when one sees a val, they know it will never change, reducing the possible number of states any particular object or function can be in.
In many cases, it also allows automatic parallel execution, for example, collection classes in Scala all have a par function, which will return a parallel collection that automatically run the calls to functions like map or reduce in parallel.
(I thought this must be a duplicate but couldn't easily find an earlier similar one, so I venture to answer...)
There is no general answer to this question. The rule of thumb suggested by the creators of Scala is to start with immutable vals and structures and stick to them as long as it makes sense. You can almost always create a workable solution to your problem this way. But if not, of course be pragmatic and use mutability.
Once you have a solution, you can tweak it, test it, measure its performance etc. If you find that e.g. it is too slow or overly complex, identify the critical part of it, understand what makes it problematic and - if needed - reimplement it using mutable variables, ideally keeping it isolated from the rest of the program. Note though that in many cases, a better solution can be found from within the immutable realm as well, so try looking there first. Especially for a beginner like myself, it still happens regularly that the best solution I could come up with looked contorted and complex with no apparent way to improve it - until seeing a simple and elegant solution to the same problem in a few lines of code, created by an experienced Scala developer who controls more of the power of the language and its libraries.
I usually obey the following rules:
Never use static mutable vars
Keep all user defined data types (typically case classes) immutable unless they are very expensive to copy. This will simplify a lot of the application logic.
If a data structure/collection is inherently mutable (i.e. it's designed to change over time), using a mutable data structure/collection might be appropriate. An example might be a large game world that is updated when players move. Remember to (almost) never share these data structures between threads though.
It's fine to use mutable local vars in methods
Use immutable collections for function results. These can be strictly or lazily evaluated depending on what gives best performance in the used context. Be careful if you use a lazily evaluated result which depends on a mutable collection though.

Does over-using function calls affect performance? Specifically in Fortran

I habitually write code with lots of functions, I find it makes it clearer. But now I'm writing some code in Fortran which needs to be very efficient, and I'm wondering whether over-using functions will slow it down, or whether the compiler will work out what's going on and optimise?
I know in Java/Python etc each function is an object, and so creating lots of functions would require them to be created in memory. I also know that in Haskell the functions are reduced into each other, so it makes little difference there.
Does anyone know about the case with Fortran? Is there a difference with using intent/pure functions/declaring fewer local variables/anything else?
Function calls carry a performance cost for stack based languages, like Fortran. They have to add on to the stack and such.
Because of this, most compilers will try to inline function calls aggressively, if it is possible. Most of the time the compiler will make the right choice on whether or not to inline certain functions in your program.
This automatic inlining process will mean that there is no extra cost for writing your function (at all).
This means that you should write your code as cleanly and organized as possible, and it is likely that the compiler will do these optimizations for you. It is more important that your overall strategy for solving the problem is the most efficient than worry about performance of function calls.
Just write the code in the simplest and most well-structured way you can, then when you have it written and tested you can profile it to see if there are any hotspots which require optimisation. Only at that point should you concern yourself with micro-optimisations, and this may not even be necessary if your compiler is doing its job.
I've just spent all morning tuning an app consisting of mixed C and Fortran, and of course it uses a lot of functions. What I found (and what I usually find) is not that functions are slow, but that certain function calls (and very few of them) don't really have to be done at all. For example, clearing memory blocks, just to be neat, but doing it at high frequency.
This is not a function of language, and not really a function of inlining either. Function calls could be free and you would still have the problem that the call tree tends to be more bushy than necessary. You need to find out where to prune it. This is the "profiling" method I rely on.
Whatever you do, find out what needs to be fixed. Don't Guess. Many people don't think of this kind of question as guessing, but when they find themselves asking "Will this work, will that help?", they're poking in the dark, rather than finding out where the problems are. Once they know where the problems are, the fix is obvious.
Typically subroutine / function calls in Fortran will have very little overhead. While the language standard doesn't specify argument passing mechanisms, the typical implementation is "by reference" so no copying is involved, only setting up a new procedure. On most modern architectures this has little overhead. Selecting good algorithms is generally far more important than micro-optimizations.
An exception about calling be quick could be case in which the compiler has to create temporary arrays, for example, if the actual argument is a non-contiguous array subsection and the called procedure argument is a plain contiguous array. Suppose that the dummy argument is dimension (:). Calling it with an array of dimension (:) is simple. If you request a non-unit stride in the call, e.g., array (1:12:3), then the array is non-contiguous and the compiler may need to create a temporary copy. Suppose that the actual argument is dimension (:,:). If the call has array (:,j), the sub-array is contiguous since in Fortran the first index varies fastest in memory and shouldn't need a copy. But array (i,:) is non-contiguous and might require a temporary copy.
Some compilers have options to warn you when temporary array copies are needed so that you can change your code, if you wish.