How can I handle binary data in Frege? - frege

I'm new to Frege, although I know both Java and Haskell.
I'm porting some Haskell code that uses ByteString, and I'm trying to figure out what to use in Frege. I assume I would want to use something whose underlying Java representation is byte[], but I'm not sure how Frege wraps that.
In particular, I looked through PreludeArrays.fr, and I noticed that there's an instance of PrimitiveArrayElement for every primitive Java type except byte.
I feel like there's something obvious I'm missing. How do I go about dealing with binary data in Frege? Are there any examples of how to do so?

There actually is such an instance. It just can't be in PreludeArrays for technical reasons. Rather it lives in frege.java.Lang, where Byte and Short are introduced.
Even if there were none, you simply could say
instance PrimitiveArrayElement Byte
and it should work.
Regarding your question: I think it is safe to say that JArray Byte should be ok for any problem with any data. Another question is if it is the best representation. For example, if those data actually are UTF8 strings, I would think that conversion into String would be the way to go.
Things to consider
mapArray, foldArray and friends are space efficient, but strict and a bit slow, because they use the ST monad.
Conversely, map and fold are reasonably fast, but of course squander lots of memory.
An approach I have used in frege.data.Hashmap was to identify very basic array operations and implement them in Java (one can do this in-line, even), and write the rest of the program in terms of those.
You may want to look at the source code to get an idea of how to do this.

Related

Pascal-style arrays, built-in len() function vs .length? What are the pros/cons

What are the differences between these two? Why would you pick one over the other, is it just personal preference, or is there an actual reason behind why you would use either a built-in function or whatever .length is.
I think using *.length over *.length() or len(*) is kind of a historical artifact, which was probably done to make getting the length of an array as fast as possible. Arrays after all, are a very basic data structure in many languages, and getting the length of one is an extremely common operation. And accessing a property is much faster than calling a method.
Nowadays a compiler could probably optimize that kind of thing out, but back then I think there was a pull towards ease-of-implementation which guided many languages to simply have *.length as a property.
However, in any OOP language at least, it's more consistent to have *.length(), because while arrays have immutable lengths, and can afford to have *.length exposed as a constant value, other data structures which you can add or remove values would not be able to do this.

Naming convention for Rx extension methods?

So if I'm writing an extension method for a type in another library that converts that type into an Rx IObservable<T>, what exactly is the convention? I ask because I thought AsObservable was the way to go, but I've also seen ToObservable. It's unclear to me which is used when or if there's any real convention at all.
Could it be ToObservable is reserved for turning something that is expected to product a single event into an IObservable<T> where as AsObservable is reserved for converting something that is expected to produce a sequence of events into an IObervable<T>?
Unless you have a very good reason to write your own cross-duality operators, you will not need to write a "To" postfix when dealing with Enumerables and Observables.
Observe the following truths:
ToObservable is expected to convert pull-based sequences into push-based sequences.
ToEnumerable is expected to convert push-based sequences into pull-based sequences.
AsObservable is expected to wrap a push-based type as IObservable< T >.
AsEnumerable is expected to wrap a pull-based type as IEnumerable< T >.
Therefore, To should be used when you're writing a method which switches the duality of the source, and As should be used when the resulting duality is the same as the source's.
In most cases, you will be using As for your own methods, because the cross-duality operators of ToObservable and ToEnumerable have already been written for you.
Sources: Personal experience, MSDN documentation (above), Erik Meijer himself.
I don't know of any official guidance, but the main metric I would use is to look at the amount of work you are doing. For most cases, there is a non-trivial amount of work being done (which is subjective in itself), such as turning an IEnumerable into an IObservable, and I would use ToObservable. When the method does fairly trivial work, as with the Observable.AsObservable extension method, AsObservable seems the better choice. Another notable difference between these two methods is that AsObservable is not much more than a type cast and doesn't make any real changes to behavior of the argument, but Observable.ToObservable(IEnumerable<T>) returns an object with significantly different semantics.

Scala immutable vs mutable. What is the way one should go?

I'm just learning to program in scala.
I have some experience in functional programming, as I have in object oriented programming.
My question is kind of simple, yet tricky:
Which structures should be used in Scala? Should we only stick to immutables, eg. modifing lists by iterating through it and stick a new one together, or go for mutables? What is your opinion on that, what are the performance aspects, memory related aspects, ...
I'm likely to program in a functional style, but it often expands to an insane amount of effort to do things which are easily done by using mutables. Is it situation dependent, what to use?
Prefer immutable to mutable state. Use mutable state only where it is absolutely necessary. Some notable reasons include:
Performance. The standard libraries make wide use of vars and while loops, even though this is not idiomatic Scala. This should not be emulated, however, except for cases where you have profiled to determine that modifying the code to be more imperative will bring a significant performance gain.
I/O. I/O, or interacting with the outside world is inherently state dependent, and thus must be dealt with in a mutable manner.
This is no different than the recommended coding style found in all major languages, imperative or functional. For example, in Java it is preferable to use data objects with only private final fields. Code written in an immutable (and functional) way is inherently easier to understand because when one sees a val, they know it will never change, reducing the possible number of states any particular object or function can be in.
In many cases, it also allows automatic parallel execution, for example, collection classes in Scala all have a par function, which will return a parallel collection that automatically run the calls to functions like map or reduce in parallel.
(I thought this must be a duplicate but couldn't easily find an earlier similar one, so I venture to answer...)
There is no general answer to this question. The rule of thumb suggested by the creators of Scala is to start with immutable vals and structures and stick to them as long as it makes sense. You can almost always create a workable solution to your problem this way. But if not, of course be pragmatic and use mutability.
Once you have a solution, you can tweak it, test it, measure its performance etc. If you find that e.g. it is too slow or overly complex, identify the critical part of it, understand what makes it problematic and - if needed - reimplement it using mutable variables, ideally keeping it isolated from the rest of the program. Note though that in many cases, a better solution can be found from within the immutable realm as well, so try looking there first. Especially for a beginner like myself, it still happens regularly that the best solution I could come up with looked contorted and complex with no apparent way to improve it - until seeing a simple and elegant solution to the same problem in a few lines of code, created by an experienced Scala developer who controls more of the power of the language and its libraries.
I usually obey the following rules:
Never use static mutable vars
Keep all user defined data types (typically case classes) immutable unless they are very expensive to copy. This will simplify a lot of the application logic.
If a data structure/collection is inherently mutable (i.e. it's designed to change over time), using a mutable data structure/collection might be appropriate. An example might be a large game world that is updated when players move. Remember to (almost) never share these data structures between threads though.
It's fine to use mutable local vars in methods
Use immutable collections for function results. These can be strictly or lazily evaluated depending on what gives best performance in the used context. Be careful if you use a lazily evaluated result which depends on a mutable collection though.

Requesting a check in my understanding of Objective-C

I have been learning Objective-C as my first language and understand Classes, Objects, instances, methods, OOP in general, etc enough to use the language and make simple applications work, but I wanted to check on a few fundamental questions that have never been explained in examples I followed.
I think the questions are so simple that they will confuse a lot of people, but I hope it will make sense to someone out there.
(While learning Objective-C the authors are assuming I have a basic computer programming background, yet I have found that a basic computer programming background is hard to come by since everyone teaching computer programming assumes you already have one to start teaching you something else. Hence the help with the fundamentals)
Passing and Returning:
When declaring methods with parameters how is the parameter stuff actually working if the arguments being passed into the parameters can have different names then the parameter names? I hope that makes sense. I know parameter names are variables for that very reason, but...
are the arguments themselves getting mapped to a look up table or something?
Second the argument "types" (int for example) have to match the parameter return types in order for them to be passed into the method, and you always have to make your arguments values equal the parameter names somewhere else in your code listing before passing them into the method?
Is the following correct: After a method gets executed it returns a particular value (if it is not void) to the class or instances that is calling the method in the first place.
Is object oriented programming really just passing "your" Objects instance methods around with the system generated classes and methods to produce a result? If we are passing things to methods so they can do some work to them and then return something back why not do the work in the first place eliminating the need to pass anything? Theoretical question I guess? I assume the answer would be: Because that would be a crazy big tangled mess of a method with everything happening all at once, but I wanted to ask anyway.
Thank you for your time.
Variables are just places where values are stored. When you pass a variable as an argument, you aren't actually passing the variable itself — you're passing the value of the variable, which is copied into the argument. There's no mapping table or anything — it just takes the value of the variable and sticks it in the argument.
In the case of objects, the variable is a pointer to an object that exists somewhere in the program's memory space. In this case, the value of the pointer is copied just like any other variable, but it still points to the same object.
(the argument "types" … have to match the parameter return types…) It isn't technically true that the types have to be the same, though they usually should be. Some types can be automatically converted to another type. For example, a char or short will be promoted to an int if you pass them to a function or method that takes an int. There's a complicated set of rules around type conversions. One thing you usually should not do is use casts to shut up compiler warnings about incompatible types — the compiler takes that to mean, "It's OK, I know what I'm doing," even if you really don't. Also, object types cannot ever be converted this way, since the variables are just pointers and the objects themselves live somewhere else. If you assign the value of an NSString*variable to an NSArray* variable, you're just lying to the compiler about what the pointer is pointing to, not turning the string into an array.
Non-void functions and methods return a value to the place where they're called, yes.
(Is object-oriented programming…) Object-oriented programming is a way of structuring your program so that it can be conceptually described as a collection of objects sending messages to each other and doing things in response to those messages.
(why not do the work in the first place eliminating the need to pass anything) The primary problem in computer programming is writing code that humans can understand and improve later. Functions and methods allow us to break our code into manageable chunks that we can reason about. They also allow us to write code once and reuse it all over the place. If we didn't factor repeated code into functions, then we'd have to repeat the code every time it is needed, which both makes the program code much longer and introduces thousands of new opportunities for bugs to creep in. 50,000-line programs would become 500 million-line programs. Not only would the program be horrendously bug-ridden, but it would be such a huge ball of spaghetti that finding the bugs would be a Herculean task.
By the way, I think you might like Uli Kusterer's Masters of the Void. It's a programming tutorial for Mac users who don't know anything about programming.
"If we are passing things to methods so they can do some work to them and then return something back why not do the work in the first place eliminating the need to pass anything?"
In the beginning, that's how it was done.
But then smart programers noticed that they were repeating copies of some work and also running out of memory, so they decided to put that chunk of work in one central place to save memory, and then call it by passing in the data from where it was before.
They gave the locations, where the data was stuffed, names, because the programs were big enough that nobody memorized all the numerical address for every bit of data any more.
Then really really big computers finally got more 16k of memory, and the programs started to become big unmanageable messes, so they codified the practice as part of structured programming. It's now a religious tenet.
But it's still done, by compilers when the inline flag is set, and also sometimes by hand on code that has to be really really fast on some very constrained processors by programmers who know when and where to make targeted trade-offs.
A little reading on the History of Computers is quite informative about how we got to where we are today, and why we do such strange things.
All that type checks used (at most) only during compilation stage, to fix errors in code.
Actually, during execution, all variables are just a block of memory, which is sent somewhere. For example, 'id' type and 'int' are both represented as 4-byte raw value, and you can write (int)id and (id)int to convert those type one to another.
And, about parameters names - they are used by compiler only to let it know, to which memory area send some data.
That's easy explanation, actually all that stuff is complicated, but I think you'll get the main idea - during execution there are no variable names/types, everything is done via operations over memory blocks.

Does over-using function calls affect performance? Specifically in Fortran

I habitually write code with lots of functions, I find it makes it clearer. But now I'm writing some code in Fortran which needs to be very efficient, and I'm wondering whether over-using functions will slow it down, or whether the compiler will work out what's going on and optimise?
I know in Java/Python etc each function is an object, and so creating lots of functions would require them to be created in memory. I also know that in Haskell the functions are reduced into each other, so it makes little difference there.
Does anyone know about the case with Fortran? Is there a difference with using intent/pure functions/declaring fewer local variables/anything else?
Function calls carry a performance cost for stack based languages, like Fortran. They have to add on to the stack and such.
Because of this, most compilers will try to inline function calls aggressively, if it is possible. Most of the time the compiler will make the right choice on whether or not to inline certain functions in your program.
This automatic inlining process will mean that there is no extra cost for writing your function (at all).
This means that you should write your code as cleanly and organized as possible, and it is likely that the compiler will do these optimizations for you. It is more important that your overall strategy for solving the problem is the most efficient than worry about performance of function calls.
Just write the code in the simplest and most well-structured way you can, then when you have it written and tested you can profile it to see if there are any hotspots which require optimisation. Only at that point should you concern yourself with micro-optimisations, and this may not even be necessary if your compiler is doing its job.
I've just spent all morning tuning an app consisting of mixed C and Fortran, and of course it uses a lot of functions. What I found (and what I usually find) is not that functions are slow, but that certain function calls (and very few of them) don't really have to be done at all. For example, clearing memory blocks, just to be neat, but doing it at high frequency.
This is not a function of language, and not really a function of inlining either. Function calls could be free and you would still have the problem that the call tree tends to be more bushy than necessary. You need to find out where to prune it. This is the "profiling" method I rely on.
Whatever you do, find out what needs to be fixed. Don't Guess. Many people don't think of this kind of question as guessing, but when they find themselves asking "Will this work, will that help?", they're poking in the dark, rather than finding out where the problems are. Once they know where the problems are, the fix is obvious.
Typically subroutine / function calls in Fortran will have very little overhead. While the language standard doesn't specify argument passing mechanisms, the typical implementation is "by reference" so no copying is involved, only setting up a new procedure. On most modern architectures this has little overhead. Selecting good algorithms is generally far more important than micro-optimizations.
An exception about calling be quick could be case in which the compiler has to create temporary arrays, for example, if the actual argument is a non-contiguous array subsection and the called procedure argument is a plain contiguous array. Suppose that the dummy argument is dimension (:). Calling it with an array of dimension (:) is simple. If you request a non-unit stride in the call, e.g., array (1:12:3), then the array is non-contiguous and the compiler may need to create a temporary copy. Suppose that the actual argument is dimension (:,:). If the call has array (:,j), the sub-array is contiguous since in Fortran the first index varies fastest in memory and shouldn't need a copy. But array (i,:) is non-contiguous and might require a temporary copy.
Some compilers have options to warn you when temporary array copies are needed so that you can change your code, if you wish.