Scala immutable vs mutable. What is the way one should go? - oop

I'm just learning to program in scala.
I have some experience in functional programming, as I have in object oriented programming.
My question is kind of simple, yet tricky:
Which structures should be used in Scala? Should we only stick to immutables, eg. modifing lists by iterating through it and stick a new one together, or go for mutables? What is your opinion on that, what are the performance aspects, memory related aspects, ...
I'm likely to program in a functional style, but it often expands to an insane amount of effort to do things which are easily done by using mutables. Is it situation dependent, what to use?

Prefer immutable to mutable state. Use mutable state only where it is absolutely necessary. Some notable reasons include:
Performance. The standard libraries make wide use of vars and while loops, even though this is not idiomatic Scala. This should not be emulated, however, except for cases where you have profiled to determine that modifying the code to be more imperative will bring a significant performance gain.
I/O. I/O, or interacting with the outside world is inherently state dependent, and thus must be dealt with in a mutable manner.
This is no different than the recommended coding style found in all major languages, imperative or functional. For example, in Java it is preferable to use data objects with only private final fields. Code written in an immutable (and functional) way is inherently easier to understand because when one sees a val, they know it will never change, reducing the possible number of states any particular object or function can be in.
In many cases, it also allows automatic parallel execution, for example, collection classes in Scala all have a par function, which will return a parallel collection that automatically run the calls to functions like map or reduce in parallel.

(I thought this must be a duplicate but couldn't easily find an earlier similar one, so I venture to answer...)
There is no general answer to this question. The rule of thumb suggested by the creators of Scala is to start with immutable vals and structures and stick to them as long as it makes sense. You can almost always create a workable solution to your problem this way. But if not, of course be pragmatic and use mutability.
Once you have a solution, you can tweak it, test it, measure its performance etc. If you find that e.g. it is too slow or overly complex, identify the critical part of it, understand what makes it problematic and - if needed - reimplement it using mutable variables, ideally keeping it isolated from the rest of the program. Note though that in many cases, a better solution can be found from within the immutable realm as well, so try looking there first. Especially for a beginner like myself, it still happens regularly that the best solution I could come up with looked contorted and complex with no apparent way to improve it - until seeing a simple and elegant solution to the same problem in a few lines of code, created by an experienced Scala developer who controls more of the power of the language and its libraries.

I usually obey the following rules:
Never use static mutable vars
Keep all user defined data types (typically case classes) immutable unless they are very expensive to copy. This will simplify a lot of the application logic.
If a data structure/collection is inherently mutable (i.e. it's designed to change over time), using a mutable data structure/collection might be appropriate. An example might be a large game world that is updated when players move. Remember to (almost) never share these data structures between threads though.
It's fine to use mutable local vars in methods
Use immutable collections for function results. These can be strictly or lazily evaluated depending on what gives best performance in the used context. Be careful if you use a lazily evaluated result which depends on a mutable collection though.

Related

Why do we use objects in OOP

I'm new to programming and especially OOP. I have been learning OOP in JS for some time and encountered the term object in OOP.
Is it true that if we do not use objects then we have to repeat (by copy and pasting) the same logic over and over then if we find a bug in the logic then we have to change in each place where that logic is used.
Thanks to the use of object we encapsulate a certain logic in one place and then reuse it and even if later we find a bug in the logic then there is only one place where we need to make changes to which is object only.
So, basically, object is responsible for only one logic which we can reuse instead of copy and pasting the logic in several places. Is my understanding of objects true guys?
OK first of all "copy/past" (code duplication) is always a bad idea. But even without objects you can write code in functions and use them to keep your code clean!
In OOP you try to treat everything as an object that has properties and maybe some functions. This helps you not only to keep your code clean but is makes working with data easier and enables you to think bigger and gives you the chance to add another layer of abstraction!
So OOP and software design patterns help keeping code clean and simple so that you can achieve bigger things. Just imagine writing a complex software but not keeping everything clean in objects but rather in functions that only take and return primitive data or maybe arrays. It would be much harder to wrap your head around this, or find bugs!

Functional data-structures, OO notions of dispatched equality and comparison, StructuralEquality, and referential transparency

I have a very CPU intensive F# program that depends on persistent data-structures - about 40% of the total CPU time is spent in the Map module. So I thought I'd try out the PersistentHashMap in FSharpX collections. (BTW, this is already a big improvement over the previous version of F# in VS2013 where the same program spent 70% of its time in Map. I also notice that running programs with the debugger attached doesn't have the huge penalty it did before - good work guys...) There is also a hot-spot where I'm re-sorting all the time, where instead I should be adding to a Heap, so I thought I'd give that a go as well.
Two issue became immediately apparent:
(1) Swapping out one for the other from an interface perspective proved harder than it seems it should - I.e., making a shim that let me switch from a Map to a PersistentMap, preserving both the needed module-based let-bound functions and Types necessary to use the each map. I know that having full HM type-inference (and no type-classes) is orthogonal to LSP-style referential transparency for the most part - but maybe I was missing some way to do this better with a minimal amount of code.
(2) The biggest problem (which I'd like to focus on here) is the reliance of the F# functional data-structs on oo-style dispatched equality and comparison via the IComparison (when 't : comparison), etc., family of interfaces.
Even for OO programs ISTM that the idea of dispatching equality and comparison is a bad idea -- an object "knows" how to perform its own domain-specific tasks, but it doesn't "know" for the most part what notion of equality is going to be necessary at various points in the program for various purposes -- so equality/comparison should not be part of the object's interface, but when these concepts are needed, they should always be mentioned explicitly. For example, there should never be a .Sort(), only a .SortWith(...). One could argue that even something as basic as structural equality in F# could be explicit a.StructEq(b) or a ~= b - otherwise you always get object.Equals -- but even stipulating that doing things this way is the best for a multi-paradigm language that's a first-class .Net citizen, it seems like there should at least be the option of using passed-in comparison and equality functions, but this is not the case.
This means that: (a) type constraints are enforced even if you don't want them, causing ripples of broken inferred typing (and hundreds of wavy red lines with it being unclear where the actual "problem" is) and (b), that by implementing a notion of equality or comparison that makes one container type happy in one part of your program (and in my case I want to use the same container and item type with two different notions of ordering in two different places), it is likely to silently break (or cause inefficiency, if one subsumes the other) in other parts of the code that depended on the default/previous implementation.
The only way around this that I could think of is wrapping each item a adapter object using new...with object expression - but I really don't want to create so much garbage just to get the code to work.
So, ISTM that we could have a "pure" version of each persistent data struct that could be loaded if desired (even basics like List, etc.) that do not depend on dispatched equality/comparison/hashing and do not impose type constraints - all such needs should be via a passed in fn's at the time of the call. (Dispatched eq/cmp would be only for used for interop with BCL collections that don't accept delegates.) Then we could have a [EqCmpHashThrowNotImplemented] attribute, and I could be sure that there were no default operations happening at all, and I would feel better about the efficiency and predictability of my code. (And this also let's one change from a Record to a Class or visa-versa w/o worrying about any changes in behavior due to default implementations.) Again, this would be optional, but done by with a simple import. (Which does mean that each base core collection type would have to be broken out into its own module, which isn't really a bad idea anyway.)
If I've overlooked a better way to do things or there are some patterns people are using here, I'd be interested.

Pascal-style arrays, built-in len() function vs .length? What are the pros/cons

What are the differences between these two? Why would you pick one over the other, is it just personal preference, or is there an actual reason behind why you would use either a built-in function or whatever .length is.
I think using *.length over *.length() or len(*) is kind of a historical artifact, which was probably done to make getting the length of an array as fast as possible. Arrays after all, are a very basic data structure in many languages, and getting the length of one is an extremely common operation. And accessing a property is much faster than calling a method.
Nowadays a compiler could probably optimize that kind of thing out, but back then I think there was a pull towards ease-of-implementation which guided many languages to simply have *.length as a property.
However, in any OOP language at least, it's more consistent to have *.length(), because while arrays have immutable lengths, and can afford to have *.length exposed as a constant value, other data structures which you can add or remove values would not be able to do this.

Using OOP results in heavy objects? Will they be slow?

I'm using OOP to write small games with different types of characters (e.g. platformers, shooters) that do different types of things. I typically try to spread out functionality into easily manageable, simple classes (e.g. an Environment class would perform common physics calculations for all its Inhabitants, so they don't need to worry about that). But, it seems that the more I refactor these programs to align with OOP principles, the heavier my character objects get. Since they're the ones with the important data, they use their own data to perform functions on themselves. This keeps them decoupled from things outside of their realm, but makes their classes seem to grow and grow. I'm totally comfortable with breaking these character classes down into more manageable components, but I worry that having many objects onscreen that are instantiated from classes with a lot of methods will result in a slow-running game.
1) Do the number of instance methods on an object directly impact its runtime performance?
2) Am I using OOP correctly if I end up with heavy character objects?
No. Or at least mostly no, anyway.
Maybe, but probably not.
For a character-based game, it's perfectly reasonable that a character would have a lot of associated data. Efficiency is rarely affected by representing that as a single "flat" collection of primitive objects, or a tree-like collection of a few large objects, each of which (recursively) has a number of smaller constituent parts.
As far as number of methods affecting performance: the number of methods can affect cache utilization, especially if you have (for example) lots of extremely small methods, and heavily-used methods are more or less interleaved with less used ones, so you end up with a lot of cache space devoted to less-used methods because they happen to be in the same cache line with something that's used more. Being methods affects this primarily because a compiler will typically arrange methods of the same class close to each other in memory, so sharing cache lines becomes more common. At least with typical implementations, however, calling a method will be O(1), so the number of methods doesn't directly affect speed.
No, its not what methods you have in an object, but what you do with them that increases runtime cost. Ofcourse there is a limit to this, but with current hardware you can completely forget about it. However, it is often questionable to go beyond a dozen or two members in a class from a design standpoint. Splitting your objects up doesn't need to incur any significant cost, you can inline all your getters and setters, and pass values by pointers and references. The compiler can flatten all your design decisions out and mostly the code from a "heavy" class is equivalent to code from a constellation of small classes
Correctly in this context is entirely dependant on the taste of the people developing the code. The processor doesn't care about what software engineering design decisions you make. If you wan't to make you objects all encompassing and it feels right to you then do it. There might come a point where things don't feel "right" to you, at that point you might split things up.

Does over-using function calls affect performance? Specifically in Fortran

I habitually write code with lots of functions, I find it makes it clearer. But now I'm writing some code in Fortran which needs to be very efficient, and I'm wondering whether over-using functions will slow it down, or whether the compiler will work out what's going on and optimise?
I know in Java/Python etc each function is an object, and so creating lots of functions would require them to be created in memory. I also know that in Haskell the functions are reduced into each other, so it makes little difference there.
Does anyone know about the case with Fortran? Is there a difference with using intent/pure functions/declaring fewer local variables/anything else?
Function calls carry a performance cost for stack based languages, like Fortran. They have to add on to the stack and such.
Because of this, most compilers will try to inline function calls aggressively, if it is possible. Most of the time the compiler will make the right choice on whether or not to inline certain functions in your program.
This automatic inlining process will mean that there is no extra cost for writing your function (at all).
This means that you should write your code as cleanly and organized as possible, and it is likely that the compiler will do these optimizations for you. It is more important that your overall strategy for solving the problem is the most efficient than worry about performance of function calls.
Just write the code in the simplest and most well-structured way you can, then when you have it written and tested you can profile it to see if there are any hotspots which require optimisation. Only at that point should you concern yourself with micro-optimisations, and this may not even be necessary if your compiler is doing its job.
I've just spent all morning tuning an app consisting of mixed C and Fortran, and of course it uses a lot of functions. What I found (and what I usually find) is not that functions are slow, but that certain function calls (and very few of them) don't really have to be done at all. For example, clearing memory blocks, just to be neat, but doing it at high frequency.
This is not a function of language, and not really a function of inlining either. Function calls could be free and you would still have the problem that the call tree tends to be more bushy than necessary. You need to find out where to prune it. This is the "profiling" method I rely on.
Whatever you do, find out what needs to be fixed. Don't Guess. Many people don't think of this kind of question as guessing, but when they find themselves asking "Will this work, will that help?", they're poking in the dark, rather than finding out where the problems are. Once they know where the problems are, the fix is obvious.
Typically subroutine / function calls in Fortran will have very little overhead. While the language standard doesn't specify argument passing mechanisms, the typical implementation is "by reference" so no copying is involved, only setting up a new procedure. On most modern architectures this has little overhead. Selecting good algorithms is generally far more important than micro-optimizations.
An exception about calling be quick could be case in which the compiler has to create temporary arrays, for example, if the actual argument is a non-contiguous array subsection and the called procedure argument is a plain contiguous array. Suppose that the dummy argument is dimension (:). Calling it with an array of dimension (:) is simple. If you request a non-unit stride in the call, e.g., array (1:12:3), then the array is non-contiguous and the compiler may need to create a temporary copy. Suppose that the actual argument is dimension (:,:). If the call has array (:,j), the sub-array is contiguous since in Fortran the first index varies fastest in memory and shouldn't need a copy. But array (i,:) is non-contiguous and might require a temporary copy.
Some compilers have options to warn you when temporary array copies are needed so that you can change your code, if you wish.