Does over-using function calls affect performance? Specifically in Fortran - optimization

I habitually write code with lots of functions, I find it makes it clearer. But now I'm writing some code in Fortran which needs to be very efficient, and I'm wondering whether over-using functions will slow it down, or whether the compiler will work out what's going on and optimise?
I know in Java/Python etc each function is an object, and so creating lots of functions would require them to be created in memory. I also know that in Haskell the functions are reduced into each other, so it makes little difference there.
Does anyone know about the case with Fortran? Is there a difference with using intent/pure functions/declaring fewer local variables/anything else?

Function calls carry a performance cost for stack based languages, like Fortran. They have to add on to the stack and such.
Because of this, most compilers will try to inline function calls aggressively, if it is possible. Most of the time the compiler will make the right choice on whether or not to inline certain functions in your program.
This automatic inlining process will mean that there is no extra cost for writing your function (at all).
This means that you should write your code as cleanly and organized as possible, and it is likely that the compiler will do these optimizations for you. It is more important that your overall strategy for solving the problem is the most efficient than worry about performance of function calls.

Just write the code in the simplest and most well-structured way you can, then when you have it written and tested you can profile it to see if there are any hotspots which require optimisation. Only at that point should you concern yourself with micro-optimisations, and this may not even be necessary if your compiler is doing its job.

I've just spent all morning tuning an app consisting of mixed C and Fortran, and of course it uses a lot of functions. What I found (and what I usually find) is not that functions are slow, but that certain function calls (and very few of them) don't really have to be done at all. For example, clearing memory blocks, just to be neat, but doing it at high frequency.
This is not a function of language, and not really a function of inlining either. Function calls could be free and you would still have the problem that the call tree tends to be more bushy than necessary. You need to find out where to prune it. This is the "profiling" method I rely on.
Whatever you do, find out what needs to be fixed. Don't Guess. Many people don't think of this kind of question as guessing, but when they find themselves asking "Will this work, will that help?", they're poking in the dark, rather than finding out where the problems are. Once they know where the problems are, the fix is obvious.

Typically subroutine / function calls in Fortran will have very little overhead. While the language standard doesn't specify argument passing mechanisms, the typical implementation is "by reference" so no copying is involved, only setting up a new procedure. On most modern architectures this has little overhead. Selecting good algorithms is generally far more important than micro-optimizations.
An exception about calling be quick could be case in which the compiler has to create temporary arrays, for example, if the actual argument is a non-contiguous array subsection and the called procedure argument is a plain contiguous array. Suppose that the dummy argument is dimension (:). Calling it with an array of dimension (:) is simple. If you request a non-unit stride in the call, e.g., array (1:12:3), then the array is non-contiguous and the compiler may need to create a temporary copy. Suppose that the actual argument is dimension (:,:). If the call has array (:,j), the sub-array is contiguous since in Fortran the first index varies fastest in memory and shouldn't need a copy. But array (i,:) is non-contiguous and might require a temporary copy.
Some compilers have options to warn you when temporary array copies are needed so that you can change your code, if you wish.

Related

What is the exhaustive list of guidelines/practices/rules to fully conform with functional paradigm?

I've started playing around with Kotlin, but I sense my own limitation in the way I program. My problem is that I still think Java therefore the style is still imperative, my question is to all functional programming zealots , which I believe would be very useful to all people who at the very beginning stage and also need to 'brake' their brain to start building it again; to leave comfort zone and start thinking pseudo and not in "whatever is your first language". I believe it is possible for highly experienced polyglot developers to chew the concepts down to plain advices of what makes your program being written in entirely functional way and what violates the paradigm. I don't know all the quirks but please don't hesitate to include universally accepted terms which might be unknown to me(I can always lookup). At this point I need this set of rules to make myself suffer at first and not break them but then I know I will feel it, analyze guidelines and understand how they are worse/better which of course is my own homework.
So example of these guidelines, would be something like:
Never change state, this can be avoided by using x, y, z
Operate using higher order functions only (I maybe wrong, just example)
I hope the answer will give me long term reference to put myself in extreme conditions where I stop escaping to OOP whenever I feel uncomfortable. And now when I look at Kotlin I understand how I've should've been thinking about problems, it is about intention not about the structure imposed by one language or another. Intention can always be converted to a language of your choice and backed up by design patterns applicable to the language, but to find that middle ground I need to jail myself first from the comfort zone.
Avoid mutable state like the plague.
One of the main points of using functional programming, possibly the main one, is to avoid all the little pitfalls, bugs, issues one needs to deal with when using mutable state. You should do everything you can in order to avoid mutating state. For instance, instead of using C-style for-loops where you need to keep a counter variable updated, use map and other higher-order functions in order to abstract away your iteration patterns. This also means that you should never change the value of a variable if you can avoid that. Instead, you should be defining almost all of your variables, preferrably all of them, as constants, and using functions to compute new values from them instead of mutating them.
Avoid side-effects like the plague.
Mutable state's ugly cousin, side-effects. Side effects mean anything other than taking a value and returning a value in a function. If that function prints data, mutates global variables, sends messages to threads, or anything, anything other than simply taking its parameters, computing a value from them, and returning a value, that function has side-effects. Side-effects are important (see next bullet point), but if you use them a lot, they get impossible to track. Just think of how everyone tells you to avoid global variables in imperative programming. Functional programming goes a step further and tries to avoid all side-effects. The bulk of your program should be made of pure functions. (See ahead)
When you need to use side-effects, keep them contained.
Yes, I just told you to run away from side-effects. However, no program is useful without side-effects of some kind. Graphical User Interface? Side-effect. Audio output? Side-effect. Printing to a shell? Side-effect. So you can't really get rid of side-effects if you want to build useful stuff.
What you should do instead is write your code so that all your side-effecting code lives in a thin layer which mostly calls pure functions and then does the required side-effects using the result of these pure function calls.
Use pure functions for everything you can.
This is sort of the flipside of the previous point. A pure function is a function which has no side-effects and does not mutate anything. It can only take in parameters and return a value. You should use these a lot. For instance, instead of doing your logging within functions which are computing stuff, you should be constructing your log strings using pure functions, and then letting your side-effects layer call these pure functions, call more pure functions in order to format the log strings into a full log, and then output the log itself from your side-effects layer.
Use higher-order functions to structure your code.
Higher-order functions are, in a way, the glue that makes functional programming work. A higher-order function is a function which takes one or more functions as parameters and/or returns a function. The power of higher-order functions is that they can encapsulate many of the patterns which you would use in an imperative-style program in a declarative manner. For instance, let's take a look at the three most common higher-order functions:
map is a function which takes a function and a list of values, applies its function argument to each of those values, and returns a new list with the results. map encapsulates the whole pattern of iterating over a list doing an operation on each value in a declarative manner.
filter is a function which takes a function which returns a boolean and a list of values, applies its function argument to each of those values and returns a list containing only those values for which its function argument returns true. It encapsulates the whole pattern of selecting results from a list in a declarative manner.
reduce, also known as fold, takes an initial value, a binary function and a list of values. It uses its function argument to combine the initial value with the first value of the list, then combines the result with the next value of the list and keeps on doing this until it has reduced the list to just one single value. It encapsulates the entire pattern of obtaining an aggregate value from a list of values.
This is in no way an exhaustive list of higher-order functions, but these three are the most common ones. I hope this has been enough to show how you can structure code which would require a lot of tracking variables using only functions in a declarative manner. If you use these higher-order functions well, it's likely you won't ever need a for or while loop again.
This is definitely not an exhaustive list of functional programming practices, but I think most functional programmers would agree these five guidelines form the core of what functional programming is about. If you want to really learn how to apply these, my advice would be to learn a pure functional programming language such as Haskell, so you are forced to abandon the imperative paradigm and to learn how to structure things functionally instead. I would recommend the fantastic Haskell Programming from First Principles as a starting resource if you choose to go this way. In case you don't want to/can't put down the cash, Brent Yorgey's Haskell course at UPenn is also a great free resource.

Vulkan: Any downsides to creating pipelines, command-buffers, etc one at a time?

Some Vulkan objects (eg vkPipelines, vkCommandBuffers) are able to be created/allocated in arrays (using size + pointer parameters). At a glance, this appears to be done to make it easier to code using common usage patterns. But in some cases (eg: when creating a C++ RAII wrapper), it's nicer to create them one at a time. It is, of course, simple to achieve this.
However, I'm wondering whether there are any significant downsides to doing this?
(I guess this may vary depending on the actual object type being created - but I didn't think it'd be a good idea to ask the same question for each object)
Assume that, in both cases, objects are likely to be created in a first-created-last-destroyed manner, and that - while the objects are individually created and destroyed - this will likely happen in a loop.
Also note:
vkCommandBuffers are also deallocated in arrays.
vkPipelines are destroyed individually.
Are there any reasons I should modify my RAII wrapper to allow for array-based creation/destruction? For example, will it save memory (significantly)? Will single-creation reduce performance?
Remember that vkPipeline creation does not require external synchronization. That means that the process is going to handle its own mutexes and so forth. As such, it makes sense to avoid locking those internal mutexes whenever possible.
Also, the process is slow. So being able to batch it up and execute it into another thread is very useful.
Command buffer creation doesn't have either of these concerns. So there, you should feel free to allocate whatever CBs you need. However, multiple creation will never harm performance, and it may help it. So there's no reason to avoid it.
Vulkan is an API designed around modern graphics hardware. If you know you want to create a certain number of objects up front, you should use the batch functions if they exist, as the driver may be able to optimize creation/allocation, resulting in potentially better performance.
There may (or may not) be better performance (depending on driver and the type of your workload). But there is obviously potential for better performance.
If you create one or ten command buffers in you application then it does not matter.
For most cases it will be like less than 5 %. So if you do not care about that (e.g. your application already runs 500 FPS), then it does not matter.
Then again, C++ is a versatile language. I think this is a non-problem. You would simply have a static member function or a class that would construct/initialize N objects (there's probably a pattern name for that).
The destruction may be trickier. You can again have static member function that would destroy N objects. But it would not be called automatically and it is annoying to have null/husk objects around. And the destructor would still be called on VK_NULL_HANDLE. There is also a problem, that a pool reset or destruction would invalidate all the command buffer C++ objects, so there's probably no way to do it cleanly/simply.

Is there a difference memory-wise between using Common Blocks and Modules in Fortran 90

I am recently learning Fortran without any guidance, and experimenting with different versions. I have found from this site:
Is a MODULE better than a COMMON block?
Almost always yes. The only reasons to use COMMON blocks are if you
expect to use your program on a computer with only a FORTRAN 77
compiler (they still exist), or if it is very important that you
control the order in which your data is stored in memory.
Well, using modules is surely syntactically sweeter than using common blocks. But what are the differences in terms memory usage and allocation in both cases? Also does it make a difference in terms performance and access speed? Does that question make sense?
M.S.B. has it in his answer, but does not stress it enough in my opinion. The variables in COMMON blocks are laid out in memory exactly in the order in the definition of the block. From this the restriction, that no dynamic memory objects (allocatable, pointer) can be in a COMMON block, immediately follows.
The "sequence association" means you can count on the placement of the variables in such a way that you can, e.g., use two following arrays as a large one.
COMMON blocks have probably no place in modern code, although they are not declared obsolete.
When it comes to speed, if the variable is the same, there shouldn't be any difference in accessing it, whether it is in a module or in a COMMON block.
One difference memory-wise is that you can use allocatable arrays in modules, but not in common. (See Fortran common variables, allocatable array). Much more convenient if you have an array that you don't know the size of the array at compile time. The old FORTRAN way was to declare the array at some huge size that was hopefully large enough, but which typically wasted space. With the allocatable array you can allocate the array at the precise size needed.
I never use COMMON in new code. It is more limited and brings in the unnecessary "sequence association", i.e., associating variables by their layout in memory.

The use of inline functions in Objective-C

I do not understand the point of 'inline functions'.
I know they are faster (such as macros) than ordinary functions, but why isn't every function an 'inline-function'?
Functions that are called from more than one place in code could be inlined, but the expense of that (ever so slight) performance boost is code size. Generally it's considered better to add the few instructions to make a subroutine call than it is to consume additional space inlining a function.
Functions that are called from exactly one place may be inlined without extra overhead.
Code size - if a function is called from multiple locations duplicating the code everywhere can quickly cause significant growth in overall code size.
Code size - Infinite! It is simply impossible to inline code which is recursive, directly or indirectly.

Scala immutable vs mutable. What is the way one should go?

I'm just learning to program in scala.
I have some experience in functional programming, as I have in object oriented programming.
My question is kind of simple, yet tricky:
Which structures should be used in Scala? Should we only stick to immutables, eg. modifing lists by iterating through it and stick a new one together, or go for mutables? What is your opinion on that, what are the performance aspects, memory related aspects, ...
I'm likely to program in a functional style, but it often expands to an insane amount of effort to do things which are easily done by using mutables. Is it situation dependent, what to use?
Prefer immutable to mutable state. Use mutable state only where it is absolutely necessary. Some notable reasons include:
Performance. The standard libraries make wide use of vars and while loops, even though this is not idiomatic Scala. This should not be emulated, however, except for cases where you have profiled to determine that modifying the code to be more imperative will bring a significant performance gain.
I/O. I/O, or interacting with the outside world is inherently state dependent, and thus must be dealt with in a mutable manner.
This is no different than the recommended coding style found in all major languages, imperative or functional. For example, in Java it is preferable to use data objects with only private final fields. Code written in an immutable (and functional) way is inherently easier to understand because when one sees a val, they know it will never change, reducing the possible number of states any particular object or function can be in.
In many cases, it also allows automatic parallel execution, for example, collection classes in Scala all have a par function, which will return a parallel collection that automatically run the calls to functions like map or reduce in parallel.
(I thought this must be a duplicate but couldn't easily find an earlier similar one, so I venture to answer...)
There is no general answer to this question. The rule of thumb suggested by the creators of Scala is to start with immutable vals and structures and stick to them as long as it makes sense. You can almost always create a workable solution to your problem this way. But if not, of course be pragmatic and use mutability.
Once you have a solution, you can tweak it, test it, measure its performance etc. If you find that e.g. it is too slow or overly complex, identify the critical part of it, understand what makes it problematic and - if needed - reimplement it using mutable variables, ideally keeping it isolated from the rest of the program. Note though that in many cases, a better solution can be found from within the immutable realm as well, so try looking there first. Especially for a beginner like myself, it still happens regularly that the best solution I could come up with looked contorted and complex with no apparent way to improve it - until seeing a simple and elegant solution to the same problem in a few lines of code, created by an experienced Scala developer who controls more of the power of the language and its libraries.
I usually obey the following rules:
Never use static mutable vars
Keep all user defined data types (typically case classes) immutable unless they are very expensive to copy. This will simplify a lot of the application logic.
If a data structure/collection is inherently mutable (i.e. it's designed to change over time), using a mutable data structure/collection might be appropriate. An example might be a large game world that is updated when players move. Remember to (almost) never share these data structures between threads though.
It's fine to use mutable local vars in methods
Use immutable collections for function results. These can be strictly or lazily evaluated depending on what gives best performance in the used context. Be careful if you use a lazily evaluated result which depends on a mutable collection though.