Mocking C library functions in Objective-C - objective-c

I'm a beginner to Objective-C, but I've used jUnit a fair bit.
For unit testing my code, I need to be able to mock the network, which I'm using through the CFStreamCreatePairWithSocketToHost function.
Per the answer to How to mock a C-function using OCMock, this is impossible in OCMock. Are there any frameworks or techniques that allow this?
I'd rather not commit to using some overcomplicated Java-esque IoC framework just to make my code easier to test, but I must say that one of the benefits of Spring is that issues like this rarely come up when you use everything through an interface and configure the implementations through dependency injection. Is that the way forward here, or is there a simpler solution?

You can rebind system(all) function pointers in memory for testing. There are multiple techniques to do it(e.g. DYLD_INSERT_LIBRARIES), but the easiest way is to use convenience library from Facebook:
https://github.com/facebook/fishhook
#import "fishhook.h"
static void *(*orig_malloc)(size_t __size);
void *my_malloc(size_t size)
{
printf("Allocated: %zu\n", size);
return orig_malloc(size);
}
/* .. */
- (void)testMyTestCase
{
//hook malloc
rebind_symbols((struct rebinding[1]){{"malloc", my_malloc, (void *)&orig_malloc}}, 1);
char *a = malloc(100);
//restore original
rebind_symbols((struct rebinding[1]){{"malloc", orig_malloc, NULL}}, 1);
//...
}

I don't know of any way to mock a C function. If it can be done, it would have to be a feature in the C compiler you're using.
As far as I know, Clang does not have any such feature, so you can't do it.
The CFStreamCreatePairWithSocketToHost() function is at a fixed location in memory and it cannot be moved nor can anything else be placed at the memory location - it's a system function shared by every app running on the device, changing it would break every other app that is currently executing.
The inability to do things like this with C function is the reason why C executes faster than most other programming languages. If you want to be able to mock functions, then do not write your code in the C language. Use objective-c instead, make a class MyCFStream with a method createPairWithSocketToHost, with a single line of code calling the C function use that everywhere in your app. You can easily mock the method in the class.
Your wrapper function will, however, be much slower than using the built in C function directly.

Another option would be to create a macro that renames the C function and calls your wrapper function in debug and the system C function in prod. This would allow you to mock/swizzle the Objective-C function for unit testing, but preserve the speed in prod.
There is one serious concern with this approach in that it changes your implementation between debug/prod, but it is a choice. In this case, I'd highly suggest having some integration tests and run them in a prod environment to confirm, to a pretty high degree of certainty, that the code is functioning as intended.
Anyway, if this interests you, please know/learn what you're doing as this can be a fairly dangerous approach if done incorrectly.

Related

Why does Math.sin() delegate to StrictMath.sin()?

I was wondering, why does Math.sin(double) delegate to StrictMath.sin(double) when I've found the problem in a Reddit thread. The mentioned code fragment looks like this (JDK 7u25):
Math.java :
public static double sin(double a) {
return StrictMath.sin(a); // default impl. delegates to StrictMath
}
StrictMath.java :
public static native double sin(double a);
The second declaration is native which is reasonable for me. The doc of Math states that:
Code generators are encouraged to use platform-specific native libraries or microprocessor instructions, where available (...)
And the question is: isn't the native library that implements StrictMath platform-specific enough? What more can a JIT know about the platform than an installed JRE (please only concentrate on this very case)? In ther words, why isn't Math.sin() native already?
I'll try to wrap up the entire discussion in a single post..
Generally, Math delegates to StrictMath. Obviously, the call can be inlined so this is not a performance issue.
StrictMath is a final class with native methods backed by native libraries. One might think, that native means optimal, but this doesn't necessarily has to be the case. Looking through StrictMath javadoc one can read the following:
(...) the definitions of some of the numeric functions in this package require that they produce the same results as certain published algorithms. These algorithms are available from the well-known network library netlib as the package "Freely Distributable Math Library," fdlibm. These algorithms, which are written in the C programming language, are then to be understood as executed with all floating-point operations following the rules of Java floating-point arithmetic.
How I understand this doc is that the native library implementing StrictMath is implemented in terms of fdlibm library, which is multi-platform and known to produce predictable results. Because it's multi-platform, it can't be expected to be an optimal implementation on every platform and I believe that this is the place where a smart JIT can fine-tune the actual performance e.g. by statistical analysis of input ranges and adjusting the algorithms/implementation accordingly.
Digging deeper into the implementation it quickly turns out, that the native library backing up StrictMath actually uses fdlibm:
StrictMath.c source in OpenJDK 7 looks like this:
#include "fdlibm.h"
...
JNIEXPORT jdouble JNICALL
Java_java_lang_StrictMath_sin(JNIEnv *env, jclass unused, jdouble d)
{
return (jdouble) jsin((double)d);
}
and the sine function is defined in fdlibm/src/s_sin.c refering in a few places to __kernel_sin function that comes directly from the header fdlibm.h.
While I'm temporarily accepting my own answer, I'd be glad to accept a more competent one when it comes up.
Why does Math.sin() delegate to StrictMath.sin()?
The JIT compiler should be able to inline the StrictMath.sin(a) call. So there's little point creating an extra native method for the Math.sin() case ... and adding extra JIT compiler smarts to optimize the calling sequence, etcetera.
In the light of that, your objection really boils down to an "elegance" issue. But the "pragmatic" viewpoint is more persuasive:
Fewer native calls makes the JVM core and JIT easier to maintain, less fragile, etcetera.
If it ain't broken, don't fix it.
At least, that's how I imagine how the Java team would view this.
The question assumes that the JVM actually runs the delegation code. On many JVMs, it won't. Calls to Math.sin(), etc.. will potentially be replaced by the JIT with some intrinsic function code (if suitable) transparently. This will typically be done in an unobservable way to the end user. This is a common trick for JVM implementers where interesting specializations can happen (even if the method is not tagged as native).
Note however that most platforms can't simply drop in the single processor instruction for sin due to suitable input ranges (eg see: Intel discussion).
Math API permits a non-strict but better-performing implementations of its methods but does not require it and by default Math simply uses StrictMath impl.

Is overriding Objective-C framework methods ever a good idea?

ObjC has a very unique way of overriding methods. Specifically, that you can override functions in OSX's own framework. Via "categories" or "Swizzling". You can even override "buried" functions only used internally.
Can someone provide me with an example where there was a good reason to do this? Something you would use in released commercial software and not just some hacked up tool for internal use?
For example, maybe you wanted to improve on some built in method, or maybe there was a bug in a framework method you wanted to fix.
Also, can you explain why this can best be done with features in ObjC, and not in C++ / Java and the like. I mean, I've heard of the ability to load a C library, but allow certain functions to be replaced, with functions of the same name that were previously loaded. How is ObjC better at modifying library behaviour than that?
If you're extending the question from mere swizzling to actual library modification then I can think of useful examples.
As of iOS 5, NSURLConnection provides sendAsynchronousRequest:queue:completionHandler:, which is a block (/closure) driven way to perform an asynchronous load from any resource identifiable with a URL (local or remote). It's a very useful way to be able to proceed as it makes your code cleaner and smaller than the classical delegate alternative and is much more likely to keep the related parts of your code close to one another.
That method isn't supplied in iOS 4. So what I've done in my project is that, when the application is launched (via a suitable + (void)load), I check whether the method is defined. If not I patch an implementation of it onto the class. Henceforth every other part of the program can be written to the iOS 5 specification without performing any sort of version or availability check exactly as if I was targeting iOS 5 only, except that it'll also run on iOS 4.
In Java or C++ I guess the same sort of thing would be achieved by creating your own class to issue URL connections that performs a runtime check each time it is called. That's a worse solution because it's more difficult to step back from. This way around if I decide one day to support iOS 5 only I simply delete the source file that adds my implementation of sendAsynchronousRequest:.... Nothing else changes.
As for method swizzling, the only times I see it suggested are where somebody wants to change the functionality of an existing class and doesn't have access to the code in which the class is created. So you're usually talking about trying to modify logically opaque code from the outside by making assumptions about its implementation. I wouldn't really support that as an idea on any language. I guess it gets recommended more in Objective-C because Apple are more prone to making things opaque (see, e.g. every app that wanted to show a customised camera view prior to iOS 3.1, every app that wanted to perform custom processing on camera input prior to iOS 4.0, etc), rather than because it's a good idea in Objective-C. It isn't.
EDIT: so, in further exposition — I can't post full code because I wrote it as part of my job, but I have a class named NSURLConnectionAsyncForiOS4 with an implementation of sendAsynchronousRequest:queue:completionHandler:. That implementation is actually quite trivial, just dispatching an operation to the nominated queue that does a synchronous load via the old sendSynchronousRequest:... interface and then posts the results from that on to the handler.
That class has a + (void)load, which is the class method you add to a class that will be issued immediately after that class has been loaded into memory, effectively as a global constructor for the metaclass and with all the usual caveats.
In my +load I use the Objective-C runtime directly via its C interface to check whether sendAsynchronousRequest:... is defined on NSURLConnection. If it isn't then I add my implementation to NSURLConnection, so from henceforth it is defined. This explicitly isn't swizzling — I'm not adjusting the existing implementation of anything, I'm just adding a user-supplied implementation of something if Apple's isn't available. Relevant runtime calls are objc_getClass, class_getClassMethod and class_addMethod.
In the rest of the code, whenever I want to perform an asynchronous URL connection I just write e.g.
[NSURLConnection sendAsynchronousRequest:request
queue:[self anyBackgroundOperationQueue]
completionHandler:
^(NSURLResponse *response, NSData *data, NSError *blockError)
{
if(blockError)
{
// oh dear; was it fatal?
}
if(data)
{
// hooray! You know, unless this was an HTTP request, in
// which case I should check the response code, etc.
}
/* etc */
}
So the rest of my code is just written to the iOS 5 API and neither knows nor cares that I have a shim somewhere else to provide that one microscopic part of the iOS 5 changes on iOS 4. And, as I say, when I stop supporting iOS 4 I'll just delete the shim from the project and all the rest of my code will continue not to know or to care.
I had similar code to supply an alternative partial implementation of NSJSONSerialization (which dynamically created a new class in the runtime and copied methods to it); the one adjustment you need to make is that references to NSJSONSerialization elsewhere will be resolved once at load time by the linker, which you don't really want. So I added a quick #define of NSJSONSerialization to NSClassFromString(#"NSJSONSerialization") in my precompiled header. Which is less functionally neat but a similar line of action in terms of finding a way to keep iOS 4 support for the time being while just writing the rest of the project to the iOS 5 standards.
There are both good and bad cases. Since you didn't mention anything in particular these examples will be all-over-the-place.
It's perfectly normal (good idea) to override framework methods when subclassing:
When subclassing NSView (from the AppKit.framework), it's expected that you override drawRect:(NSRect). It's the mechanism used for drawing views.
When creating a custom NSMenu, you could override insertItemWithTitle:action:keyEquivalent:atIndex: and any other methods...
The main thing when subclassing is whether or not your behaviour completes re-defines the old behaviour... or extends it (in which case your override eventually calls [super ...];)
That said, however, you should always stand clear of using (and overriding) any private API methods (those normally have an underscore prefix in their name). This is a bad idea.
You also should not override existing methods via categories. That's also bad. It has undefined behaviour.
If you're talking about categories, you don't override methods with them (because there is no way to call original method, like calling super when subclassing), but only completely replace with your own ones, which makes the whole idea mostly pointless. Categories are only useful for safely extending functionality, and that's the only use I have even seen (and which is a very good, an excellent idea), although indeed they can be used for dangerous things.
If you mean overriding by subclassing, that is not unique. But in Obj-C you can override everything, even private undocumented methods, not just what was declared 'overridable' like in other languages. Personally, I think it's nice, as I remember in Delphi and C++ I used to “hack” access to private and protected members to workaround an internal bug in framework. This is not a good idea, but at some moments it can be a life saver.
There is also method swizzling, but that's not standard language feature, that's a hack. Hacking undocumented internals is rarely a good idea.
And regarding “how can you explain why this can best be done with features in ObjC”, the answer is simple — Obj-C is dynamic, and this freedom is common to almost all dynamic languages (Javascript, Python, Ruby, Io, a lot more). Unless artificially disabled, every dynamic language has it.
Refer to the wikipedia page on dynamic languages for longer explanation and more examples. For example, an even more miraculous things possible in Obj-C and other dynamic languages is that an object can change it's type (class) in place, without recreation.

STM32 programming tips and questions

I could not find any good document on internet about STM32 programming. STM's own documents do not explain anything more than register functions. I will greatly appreciate if anyone can explain my following questions?
I noticed that in all example programs that STM provides, local variables for main() are always defined outside of the main() function (with occasional use of static keyword). Is there any reason for that? Should I follow a similar practice? Should I avoid using local variables inside the main?
I have a gloabal variable which is updated within the clock interrupt handle. I am using the same variable inside another function as a loop condition. Don't I need to access this variable using some form of atomic read operation? How can I know that a clock interrupt does not change its value in the middle of the function execution? Should I need to cancel clock interrupt everytime I need to use this variable inside a function? (However, this seems extremely ineffective to me as I use it as loop condition. I believe there should be better ways of doing it).
Keil automatically inserts a startup code which is written in assembly (i.e. startup_stm32f4xx.s). This startup code has the following import statements:
IMPORT SystemInit
IMPORT __main
.In "C", it makes sense. However, in C++ both main and system_init have different names (e.g. _int_main__void). How can this startup code can still work in C++ even without using "extern "C" " (I tried and it worked). How can the c++ linker (armcc --cpp) can associate these statements with the correct functions?
you can use local or global variables, using local in embedded systems has a risk of your stack colliding with your data. with globals you dont have that problem. but this is true no matter where you are, embedded microcontroller, desktop, etc.
I would make a copy of the global in the foreground task that uses it.
unsigned int myglobal;
void fun ( void )
{
unsigned int myg;
myg=myglobal;
and then only use myg for the rest of the function. Basically you are taking a snapshot and using the snapshot. You would want to do the same thing if you are reading a register, if you want to do multiple things based on a sample of something take one sample of it and make decisions on that one sample, otherwise the item can change between samples. If you are using one global to communicate back and forth to the interrupt handler, well I would use two variables one foreground to interrupt, the other interrupt to foreground. yes, there are times where you need to carefully manage a shared resource like that, normally it has to do with times where you need to do more than one thing, for example if you had several items that all need to change as a group before the handler can see them change then you need to disable the interrupt handler until all the items have changed. here again there is nothing special about embedded microcontrollers this is all basic stuff you would see on a desktop system with a full blown operating system.
Keil knows what they are doing if they support C++ then from a system level they have this worked out. I dont use Keil I use gcc and llvm for microcontrollers like this one.
Edit:
Here is an example of what I am talking about
https://github.com/dwelch67/stm32vld/tree/master/stm32f4d/blinker05
stm32 using timer based interrupts, the interrupt handler modifies a variable shared with the foreground task. The foreground task takes a single snapshot of the shared variable (per loop) and if need be uses the snapshot more than once in the loop rather than the shared variable which can change. This is C not C++ I understand that, and I am using gcc and llvm not Keil. (note llvm has known problems optimizing tight while loops, very old bug, dont know why they have no interest in fixing it, llvm works for this example).
Question 1: Local variables
The sample code provided by ST is not particularly efficient or elegant. It gets the job done, but sometimes there are no good reasons for the things they do.
In general, you use always want your variables to have the smallest scope possible. If you only use a variable in one function, define it inside that function. Add the "static" keyword to local variables if and only if you need them to retain their value after the function is done.
In some embedded environments, like the PIC18 architecture with the C18 compiler, local variables are much more expensive (more program space, slower execution time) than global. On the Cortex M3, that is not true, so you should feel free to use local variables. Check the assembly listing and see for yourself.
Question 2: Sharing variables between interrupts and the main loop
People have written entire chapters explaining the answers to this group of questions. Whenever you share a variable between the main loop and an interrupt, you should definitely use the volatile keywords on it. Variables of 32 or fewer bits can be accessed atomically (unless they are misaligned).
If you need to access a larger variable, or two variables at the same time from the main loop, then you will have to disable the clock interrupt while you are accessing the variables. If your interrupt does not require precise timing, this will not be a problem. When you re-enable the interrupt, it will automatically fire if it needs to.
Question 3: main function in C++
I'm not sure. You can use arm-none-eabi-nm (or whatever nm is called in your toolchain) on your object file to see what symbol name the C++ compiler assigns to main(). I would bet that C++ compilers refrain from mangling the main function for this exact reason, but I'm not sure.
STM's sample code is not an exemplar of good coding practice, it is merely intended to exemplify use of their standard peripheral library (assuming those are the examples you are talking about). In some cases it may be that variables are declared external to main() because they are accessed from an interrupt context (shared memory). There is also perhaps a possibility that it was done that way merely to allow the variables to be watched in the debugger from any context; but that is not a reason to copy the technique. My opinion of STM's example code is that it is generally pretty poor even as example code, let alone from a software engineering point of view.
In this case your clock interrupt variable is atomic so long as it is 32bit or less so long as you are not using read-modify-write semantics with multiple writers. You can safely have one writer, and multiple readers regardless. This is true for this particular platform, but not necessarily universally; the answer may be different for 8 or 16 bit systems, or for multi-core systems for example. The variable should be declared volatile in any case.
I am using C++ on STM32 with Keil, and there is no problem. I am not sure why you think that the C++ entry points are different, they are not here (Keil ARM-MDK v4.22a). The start-up code calls SystemInit() which initialises the PLL and memory timing for example, then calls __main() which performs global static initialisation then calls C++ constructors for global static objects before calling main(). If in doubt, step through the code in the debugger. It is important to note that __main() is not the main() function you write for your application, it is a wrapper with different behaviour for C and C++, but which ultimately calls your main() function.

Creating an Objective-C API

I have never made an API in objective-c, and need to do this now.
The "idea" is that I build an API which can be implemented into other applications. Much like Flurry, only for other purposes.
When starting the API, an username, password and mode should be entered. The mode should either be LIVE or BETA (I guess this should be an NSString(?)), then afterwards is should be fine with [MyAPI doSomething:withThisObject]; ect.
So to start it [MyAPI username:#"Username" password:#"Password" mode:#"BETA"];
Can anyone help me out with some tutorials and pointer on how to learn this best?
It sounds like what you want to do is build a static library. This is a compiled .a file containing object code that you'll distribute to a client along with a header file containing the interface. This post is a little outdated but has some good starting points. Or, if you don't mind giving away your source code, you could just deliver a collection of source files to your client.
In terms of developing the API itself, it should be very similar to the way you'd design interfaces and implementations of Objective-C objects in your own apps. You'll have a MyAPI class with functions for initialization, destruction, and all the functionality you want. You could also have multiple classes with different functionality if the interface is complex. Because you've capitalized MyAPI in your code snippet, it looks like you want to use it by calling the class rather than an instance of the class - which is a great strategy if you think you'll only ever need one instance. To accomplish this you can use the singleton pattern.
Because you've used a username and password, I imagine your API will interface with the web internally. I've found parsing JSON to be very straightforward in Objective-C - it's easy to send requests and get information from a server.
Personally I would use an enum of unsigned ints rather than a NSString just because it simplifies comparisons and such. So you could do something like:
enum {
MYAPI_MODE_BETA,
MYAPI_MODE_LIVE,
NUM_MYAPI_MODES
};
And then call:
[MyAPI username:#"Username" password:#"Password" mode:MYAPI_MODE_BETA];
Also makes it easy to check if they've supplied a valid mode. (Must be less than NUM_MYAPI_MODES.)
Good luck!

Can someone explain to me why I would need functional programming instead of OOP? [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Functional programming vs Object Oriented programming
Can someone explain to me why I would need functional programming instead of OOP?
E.g. why would I need to use Haskell instead of C++ (or a similar language)?
What are the advantages of functional programming over OOP?
One of the big things I prefer in functional programming is the lack of "spooky action at a distance". What you see is what you get – and no more. This makes code far easier to reason about.
Let's use a simple example. Let's say I come across the code snippet X = 10 in either Java (OOP) or Erlang (functional). In Erlang I can know these things very quickly:
The variable X is in the immediate context I'm in. Period. It's either a parameter passed in to the function I'm reading or it's being assigned the first (and only—c.f. below) time.
The variable X has a value of 10 from this point onward. It will not change again within the block of code I'm reading. It cannot.
In Java it's more complicated:
The variable X might be defined as a parameter.
It might be defined somewhere else in the method.
It might be defined as part of the class the method is in.
Whatever the case is, since I'm not declaring it here, I'm changing its value. This means I don't know what the value of X will be without constantly scanning backward through the code to find the last place it was assigned or modified explicitly or implicitly (like in a for loop).
When I call another method, if X happens to be a class variable it may change out from underneath me with no way for me to know this without inspecting the code of that method.
In the context of a threading program it's even worse. X can be changed by something I can't even see in my immediate environment. Another thread may be calling the method in #5 that modifies X.
And Java is a relatively simple OOP language. The number of ways that X can be screwed around with in C++ is even higher and potentially more obscure.
And the thing is? This is just a simple example of how a common operation can be far more complicated in an OOP (or other imperative) language than in a functional. It also doesn't address the benefits of functional programming that don't involve mutable state, etc. like higher order functions.
There are three things about Haskell that I think are really cool:
1) It's a statically-typed language that is extremely expressive and lets you build highly maintainable and refactorable code quickly. There's been a big debate between statically typed languages like Java and C# and dynamic languages like Python and Ruby. Python and Ruby let you quickly build programs, using only a fraction of the number of lines required in a language like Java or C#. So, if your goal is to get to market quickly, Python and Ruby are good choices. But, because they're dynamic, refactoring and maintaining your code is tough. In Java, if you want to add a parameter to a method, it's easy to use the IDE to find all instances of the method and fix them. And if you miss one, the compiler catches it. With Python and Ruby, refactoring mistakes will only be caught as run-time errors. So, with traditional languages, you get to choose between quick development and lousy maintainability on the one hand and slow development and good maintainability on the other hand. Neither choice is very good.
But with Haskell, you don't have to make this type of choice. Haskell is statically typed, just like Java and C#. So, you get all the refactorability, potential for IDE support, and compile-time checking. But at the same time, types can be inferred by the compiler. So, they don't get in your way like they do with traditional static languages. Plus, the language offers many other features that allow you to accomplish a lot with only a few lines of code. So, you get the speed of development of Python and Ruby along with the safety of static languages.
2) Parallelism. Because functions don't have side effects, it's much easier for the compiler to run things in parallel without much work from you as a developer. Consider the following pseudo-code:
a = f x
b = g y
c = h a b
In a pure functional language, we know that functions f and g have no side effects. So, there's no reason that f has to be run before g. The order could be swapped, or they could be run at the same time. In fact, we really don't have to run f and g at all until their values are needed in function h. This is not true in a traditional language since the calls to f and g could have side effects that could require us to run them in a particular order.
As computers get more and more cores on them, functional programming becomes more important because it allows the programmer to easily take advantage of the available parallelism.
3) The final really cool thing about Haskell is also possibly the most subtle: lazy evaluation. To understand this, consider the problem of writing a program that reads a text file and prints out the number of occurrences of the word "the" on each line of the file. Suppose you're writing in a traditional imperative language.
Attempt 1: You write a function that opens the file and reads it one line at a time. For each line, you calculate the number of "the's", and you print it out. That's great, except your main logic (counting the words) is tightly coupled with your input and output. Suppose you want to use that same logic in some other context? Suppose you want to read text data off a socket and count the words? Or you want to read the text from a UI? You'll have to rewrite your logic all over again!
Worst of all, what if you want to write an automated test for your new code? You'll have to build input files, run your code, capture the output, and then compare the output against your expected results. That's do-able, but it's painful. Generally, when you tightly couple IO with logic, it becomes really difficult to test the logic.
Attempt 2: So, let's decouple IO and logic. First, read the entire file into a big string in memory. Then, pass the string to a function that breaks the string into lines, counts the "the's" on each line, and returns a list of counts. Finally, the program can loop through the counts and output them. It's now easy to test the core logic since it involves no IO. It's now easy to use the core logic with data from a file or from a socket or from a UI. So, this is a great solution, right?
Wrong. What if someone passes in a 100GB file? You'll blow out your memory since the entire file must be loaded into a string.
Attempt 3: Build an abstraction around reading the file and producing results. You can think of these abstractions as two interfaces. The first has methods nextLine() and done(). The second has outputCount(). Your main program implements nextLine() and done() to read from the file, while outputCount() just directly prints out the count. This allows your main program to run in constant memory. Your test program can use an alternate implementation of this abstraction that has nextLine() and done() pull test data from memory, while outputCount() checks the results rather than outputting them.
This third attempt works well at separating the logic and the IO, and it allows your program to run in constant memory. But, it's significantly more complicated than the first two attempts.
In short, traditional imperative languages (whether static or dynamic) frequently leave developers making a choice between
a) Tight coupling of IO and logic (hard to test and reuse)
b) Load everything into memory (not very efficient)
c) Building abstractions (complicated, and it slows down implementation)
These choices come up when reading files, querying databases, reading sockets, etc. More often than not, programmers seem to favor option A, and unit tests suffer as a consequence.
So, how does Haskell help with this? In Haskell, you would solve this problem exactly like in Attempt 2. The main program loads the whole file into a string. Then it calls a function that examines the string and returns a list of counts. Then the main program prints the counts. It's super easy to test and reuse the core logic since it's isolated from the IO.
But what about memory usage? Haskell's lazy evaluation takes care of that for you. So, even though your code looks like it loaded the whole file contents into a string variable, the whole contents really aren't loaded. Instead, the file is only read as the string is consumed. This allows it to be read one buffer at a time, and your program will in fact run in constant memory. That is, you can run this program on a 100GB file, and it will consume very little memory.
Similarly, you can query a database, build a resulting list containing a huge set of rows, and pass it to a function to process. The processing function has no idea that the rows came from a database. So, it's decoupled from its IO. And under-the-covers, the list of rows will be fetched lazily and efficiently. So, even though it looks like it when you look at your code, the full list of rows is never all in memory at the same time.
End result, you can test your function that processes the database rows without even having to connect to a database at all.
Lazy evaluation is really subtle, and it takes a while to get your head around its power. But, it allows you to write nice simple code that is easy to test and reuse.
Here's the final Haskell solution and the Approach 3 Java solution. Both use constant memory and separate IO from processing so that testing and reuse are easy.
Haskell:
module Main
where
import System.Environment (getArgs)
import Data.Char (toLower)
main = do
(fileName : _) <- getArgs
fileContents <- readFile fileName
mapM_ (putStrLn . show) $ getWordCounts fileContents
getWordCounts = (map countThe) . lines . map toLower
where countThe = length . filter (== "the") . words
Java:
import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.Reader;
class CountWords {
public interface OutputHandler {
void handle(int count) throws Exception;
}
static public void main(String[] args) throws Exception {
BufferedReader reader = null;
try {
reader = new BufferedReader(new FileReader(new File(args[0])));
OutputHandler handler = new OutputHandler() {
public void handle(int count) throws Exception {
System.out.println(count);
}
};
countThe(reader, handler);
} finally {
if (reader != null) reader.close();
}
}
static public void countThe(BufferedReader reader, OutputHandler handler) throws Exception {
String line;
while ((line = reader.readLine()) != null) {
int num = 0;
for (String word: line.toLowerCase().split("([.,!?:;'\"-]|\\s)+")) {
if (word.equals("the")) {
num += 1;
}
}
handler.handle(num);
}
}
}
If we compare Haskell and C++, functional programming makes debugging extremely easy, because there's no mutable state and variables like ones found in C, Python etc. which you should always care about, and it's ensured that, given some arguments, a function will always return the same results in spite the number of times you evaluate it.
OOP is orthogonal to any programming paradigm, and there are lanugages which combine FP with OOP, OCaml being the most popular, several Haskell implementations etc.