In a very interesting post from 2001 Allen Wirfs-Brock explains how to implement block closures without reifying the (native) stack.
From the many ideas he exposes there is one that I don't quite understand and I thought it would be a good idea to ask it here. He says:
Any variable that can never be assigned during the lifetime of a block (e.g., arguments of enclosing methods and blocks) need not be placed in the environment if instead a copy of the variable is placed in the closure when it is created
There are two things I'm not sure I understand well enough:
Why using two copies of the read-only variable is faster than having the variable moved to the environment? Is it because it would be faster for the enclosing context to access the (original) variable in the stack?
How can we ensure that the two variables remain synchronized?
In question 1 there must be another reason. Otherwise I don't see the gain (when compared with the cost of implementing the optimization.)
For Question 2 take a non argument that is assigned in the method and not in the block. Why the oop stored in the stack would remain unchanged during the life of the block?
I think I know the answer to Q2: Because the execution of the block cannot be intertwined with the execution of the method, i.e., while the block lives, the enclosing context does not run. But isn't there any way to modify the stack temporary while the block is alive?
Thanks to the comment of #aka.nice I found the answers to the two questions in Clement Bera's post, whose reading is both pleasant and clarifying.
For Q1 let's first say that Allen's remark means that the copy of the read-only variable can be placed in the block's stack, as if it were a local temporary of the block. The advantage of doing this only materializes if all variables defined outside the block and used inside it are never written in the block. Under these circumstances there would be no need to create the environment array and to emit any prolog or epilog to take care of it.
The machine code that accesses a stack variable is equivalent to the one required to access the environment one because the first would address the location using [ebp + offset] while the second would use [edi + offest], once edi has been set to point to the environment array (tempVector in Clement's notation.) So, there is no gain if some but not all of the environment variables are read-only.
The second question is also answered in Clement's excellent blog. Yes, there is another way to break the synchrony between the original variable and its copy in the block's stack: the debugger (as aka.nice would have told us!) If the programmer modifies the variable in the enclosing context, the debugger will need to detect the action and update the copy as well. Same if the programmer modifies the copy held in the block's stack.
I'm glad I decided to post the question here. The help I received from aka.nice and Clement Bera, plus the comments some people sent me by email helped a lot in augmenting my understanding.
One final remark. Wirfs-Brock claims that avoiding the reification of method contexts is mandatory. I tend to agree. However, many important operations on these data structures can be better implemented if the reification follows the lightweight pattern. More precisely, when debugging you can model these contexts with "viewers" that point to the native stack and use two indexes to delimit the portion that corresponds to the activation under analysis. This is both efficient and clean and the combination of both techniques leads to the best of the worlds because you can have speed and expressiveness at once. Smalltalk is amazing.
Related
Is there some way to highlight similar or alternative code blocks in xCode or alternative Obj-C program? At the attached picture you can quickly realise which block runs after another or which is an alternative (if-else). (The code on the picture is just example). It seems to be the task related to {}-counting, so I expect there is some implementation.
Actually, I could understand the FLOW of the code on the picture only after I highlighted it as you see.
What you are asking for is scope highlighting. Xcode does not do this to my knowledge. However, you can mouse over the code folding column to the left and see the scope briefly.
Since this is kind of a non-answer, I originally wrote it as a comment. But it got too long, so I'm putting it here. Apologies in advance for avoiding the question. But I'm trying to address the problem behind the question…
If you have to color-code a method to understand it, the method is too long. Extract methods and give them meaningful names that clearly state their purpose (though not their implementation).
Here are rules-of-thumb I follow:
If a method has many local variables, or a number of different indentations, first use Extract Class to pull the method into a new class that works as a function object. Then promote the local variables to ivars.
Despite many who state the dangers, if an inner scope has one line, I omit the braces. This reduces the vertical distance of the code, which makes it more readable. Readability is more important. (But this may be risky if your code isn't covered by unit tests. So make sure the method is well-covered.)
When I see braces inside a method, I try to extract that portion into another (well-named) method.
Look for opportunities to extract the contents of if statements into predicate methods that express the what (burying the how inside the method).
I try to keep methods under six lines. Any more than that, and I start eyeing it critically: Is it doing more than one thing? Is it operating at more than one level of abstraction?
For much, much more on these principles, I highly recommend Clean Code episode 3.
In objective-c when you are implementing a method that is going to perform a repetitive operations, for example, you need to choice in between the several options that the language brings you:
#interface FancyMutableCollection : NSObject { }
-(void)sortUsingSelector:(SEL)comparator;
// or ...
-(void)sortUsingComparator:(NSComparator)cmptr;
#end
I was wondering which one is better?
Objective-c provides many options: selectors, blocks, pointers to functions, instances of a class that conforms a protocol, etc.
Some times the choice is clear, because only one method suits your needs, but what about the rest? I don't expect this to be just a matter of fashion.
Are there any rules to know when to use selectors and when to use blocks?
The main difference I can think of is that with blocks, they act like closures so they capture all of the variables in the scope around them. This is good for when you already have the variables there and don't want to create an instance variable just to hold that variable temporarily so that the action selector can access it when it is run.
With relation to collections, blocks have the added ability to be run concurrently if there are multiple cores in the system. Currently in the iPhone there isn't, but the iPad 2 does have it and it is probable that future iPhone models will have multiple cores. Using blocks, in this case, would allow your app to scale automatically in the future.
In some cases, blocks are just easier to read as well because the callback code is right next to the code that's calling it back. This is not always the case of course, but when sometimes it does simply make the code easier to read.
Sorry to refer you to the documentation, but for a more comprehensive overview of the pros/cons of blocks, take a look at this page.
As Apple puts it:
Blocks represent typically small, self-contained pieces of code. As such, they’re particularly useful as a means of encapsulating units of work that may be executed concurrently, or over items in a collection, or as a callback when another operation has finished.
Blocks are a useful alternative to traditional callback functions for two main reasons:
They allow you to write code at the point of invocation that is executed later in the context of the method implementation.
Blocks are thus often parameters of framework methods.
They allow access to local variables.
Rather than using callbacks requiring a data structure that embodies all the contextual information you need to perform an operation, you simply access local variables directly.
On this page
The one that's better is whichever one works better in the situation at hand. If your objects all implement a comparison selector that supports the ordering you want, use that. If not, a block will probably be easier.
I have been learning Objective-C as my first language and understand Classes, Objects, instances, methods, OOP in general, etc enough to use the language and make simple applications work, but I wanted to check on a few fundamental questions that have never been explained in examples I followed.
I think the questions are so simple that they will confuse a lot of people, but I hope it will make sense to someone out there.
(While learning Objective-C the authors are assuming I have a basic computer programming background, yet I have found that a basic computer programming background is hard to come by since everyone teaching computer programming assumes you already have one to start teaching you something else. Hence the help with the fundamentals)
Passing and Returning:
When declaring methods with parameters how is the parameter stuff actually working if the arguments being passed into the parameters can have different names then the parameter names? I hope that makes sense. I know parameter names are variables for that very reason, but...
are the arguments themselves getting mapped to a look up table or something?
Second the argument "types" (int for example) have to match the parameter return types in order for them to be passed into the method, and you always have to make your arguments values equal the parameter names somewhere else in your code listing before passing them into the method?
Is the following correct: After a method gets executed it returns a particular value (if it is not void) to the class or instances that is calling the method in the first place.
Is object oriented programming really just passing "your" Objects instance methods around with the system generated classes and methods to produce a result? If we are passing things to methods so they can do some work to them and then return something back why not do the work in the first place eliminating the need to pass anything? Theoretical question I guess? I assume the answer would be: Because that would be a crazy big tangled mess of a method with everything happening all at once, but I wanted to ask anyway.
Thank you for your time.
Variables are just places where values are stored. When you pass a variable as an argument, you aren't actually passing the variable itself — you're passing the value of the variable, which is copied into the argument. There's no mapping table or anything — it just takes the value of the variable and sticks it in the argument.
In the case of objects, the variable is a pointer to an object that exists somewhere in the program's memory space. In this case, the value of the pointer is copied just like any other variable, but it still points to the same object.
(the argument "types" … have to match the parameter return types…) It isn't technically true that the types have to be the same, though they usually should be. Some types can be automatically converted to another type. For example, a char or short will be promoted to an int if you pass them to a function or method that takes an int. There's a complicated set of rules around type conversions. One thing you usually should not do is use casts to shut up compiler warnings about incompatible types — the compiler takes that to mean, "It's OK, I know what I'm doing," even if you really don't. Also, object types cannot ever be converted this way, since the variables are just pointers and the objects themselves live somewhere else. If you assign the value of an NSString*variable to an NSArray* variable, you're just lying to the compiler about what the pointer is pointing to, not turning the string into an array.
Non-void functions and methods return a value to the place where they're called, yes.
(Is object-oriented programming…) Object-oriented programming is a way of structuring your program so that it can be conceptually described as a collection of objects sending messages to each other and doing things in response to those messages.
(why not do the work in the first place eliminating the need to pass anything) The primary problem in computer programming is writing code that humans can understand and improve later. Functions and methods allow us to break our code into manageable chunks that we can reason about. They also allow us to write code once and reuse it all over the place. If we didn't factor repeated code into functions, then we'd have to repeat the code every time it is needed, which both makes the program code much longer and introduces thousands of new opportunities for bugs to creep in. 50,000-line programs would become 500 million-line programs. Not only would the program be horrendously bug-ridden, but it would be such a huge ball of spaghetti that finding the bugs would be a Herculean task.
By the way, I think you might like Uli Kusterer's Masters of the Void. It's a programming tutorial for Mac users who don't know anything about programming.
"If we are passing things to methods so they can do some work to them and then return something back why not do the work in the first place eliminating the need to pass anything?"
In the beginning, that's how it was done.
But then smart programers noticed that they were repeating copies of some work and also running out of memory, so they decided to put that chunk of work in one central place to save memory, and then call it by passing in the data from where it was before.
They gave the locations, where the data was stuffed, names, because the programs were big enough that nobody memorized all the numerical address for every bit of data any more.
Then really really big computers finally got more 16k of memory, and the programs started to become big unmanageable messes, so they codified the practice as part of structured programming. It's now a religious tenet.
But it's still done, by compilers when the inline flag is set, and also sometimes by hand on code that has to be really really fast on some very constrained processors by programmers who know when and where to make targeted trade-offs.
A little reading on the History of Computers is quite informative about how we got to where we are today, and why we do such strange things.
All that type checks used (at most) only during compilation stage, to fix errors in code.
Actually, during execution, all variables are just a block of memory, which is sent somewhere. For example, 'id' type and 'int' are both represented as 4-byte raw value, and you can write (int)id and (id)int to convert those type one to another.
And, about parameters names - they are used by compiler only to let it know, to which memory area send some data.
That's easy explanation, actually all that stuff is complicated, but I think you'll get the main idea - during execution there are no variable names/types, everything is done via operations over memory blocks.
Ok, 2nd attempt at writing a Stack Overflow Question, so forgive me if this seems familiar.
I am rewriting an Excel Macro that was built over a 2 1/2 year period, frankenstein style (added to piecemeal). One of the things I need to do is load the data into an array once and only once for data accuracy and speed. For my skill level I am going to stick with the Array methodology.
My two approaches are:
Use Global dimmed dynamic Arrays
Dim the dynamic arrays in my Main procedure and pass them to the called procedures
So, what is Stack Overflow's take on the Pros vs Cons of these two methods?
Thanks,
Craig...
First, to answer the question you specifically didn't ask: Set up a custom class and load the data in that. Seriously, you'll thank me later.
OK, on to your question. I start by limiting the scope as much as possible. That means that I'm passing variables between procedures. When all your variables have the most restrictive scope possible, you run into the fewest problems down the line.
Once a variable passes two levels deep (calling procedure to 1st tier, 1st tier to 2nd tier), then I start taking a critical look at my structure. Usually (but not always) if all three procedures are in the same module, I'll create a module-level variable (use the Private keyword instead of Dim). If you separate your modules correctly (not arbitrarily) you can have module-level variables without much risk.
There are some variables that are always global right from the start: the variable that holds the app name and app version; the top-level class module that should never lose scope as long as the app is running; the constants (I know they're not variables) that hold things like commandbar names. I know I want these global, so they start that way.
I'm going to go out on a limb and say that module-level variables never migrate to global variables. Global variables start out that way because of their nature. If using a module-level variable seems cumbersome, it's probably because I've split a module up for no good reason or I need to rethink my whole framework.
That's not to say I've never cheated and used a global when I shouldn't have. We've all done it and you shouldn't lose any sleep if you do it too.
So to properly book-end this post: I quit using arrays unless I'm forced to. I use custom classes because
ActiveCell.Value = Invoice.LocalSalesTaxAmount
is so much nicer to debug than
ActiveCell.Value = aInvoice(35,2)
Just in case you think you need more skill to work with custom classes - so did I. I bit the bullet and so can anyone else.
You need to be careful with globals in Excel VBA, because if your application hits any kind of bug, and does some kind of soft reset (but the app still functions), then the globals will have been erased.
I had to give up on globals, since I don't write perfect apps.
I often find myself needing reference to an object that is several objects away, or so it seems. The options I see are passing a reference through a middle-man or just making something available statically. I understand the danger of global scope, but passing a reference through an object that does nothing with it feels ridiculous. I'm okay with a little bit passing around, I suppose. I suspect there's a line to be drawn somewhere.
Does anyone have insight on where to draw this line?
Or a good way to deal with the problem of distributing references amongst dependent objects?
Use the Law of Demeter (with moderation and good taste, not dogmatically). If you're coding a.b.c.d.e, something IS wrong -- you've nailed forevermore the implementation of a to have a b which has a c which... EEP!-) One or at the most two dots is the maximum you should be using. But the alternative is NOT to plump things into globals (and ensure thread-unsafe, buggy, hard-to-maintain code!), it is to have each object "surface" those characteristics it is designed to maintain as part of its interface to clients going forward, instead of just letting poor clients go through such undending chains of nested refs!
This smells of an abstraction that may need some improvement. You seem to be violating the Law of Demeter.
In some cases a global isn't too bad.
Consider, you're probably programming against an operating system's API. That's full of globals, you can probably access a file or the registry, write to the console. Look up a window handle. You can do loads of stuff to access state that is global across the whole computer, or even across the internet... and you don't have to pass a single reference to your class to access it. All this stuff is global if you access the OS's API.
So, when you consider the number of global things that often exist, a global in your own program probably isn't as bad as many people try and make out and scream about.
However, if you want to have very nice OO code that is all unit testable, I suppose you should be writing wrapper classes around any access to globals whether they come from the OS, or are declared yourself to encapsulate them. This means you class that uses this global state can get references to the wrappers, and they could be replaced with fakes.
Hmm, anyway. I'm not quite sure what advice I'm trying to give here, other than say, structuring code is all a balance! And, how to do it for your particular problem depends on your preferences, preferences of people who will use the code, how you're feeling on the day on the academic to pragmatic scale, how big the code base is, how safety critical the system is and how far off the deadline for completion is.
I believe your question is revealing something about your classes. Maybe the responsibilities could be improved ? Maybe moving some code would solve problems ?
Tell, don't ask.
That's how it was explained to me. There is a natural tendency to call classes to obtain some data. Taken too far, asking too much, typically leads to heavy "getter sequences". But there is another way. I must admit it is not easy to find, but improves gradually in a specific code and in the coder's habits.
Class A wants to perform a calculation, and asks B's data. Sometimes, it is appropriate that A tells B to do the job, possibly passing some parameters. This could replace B's "getName()", used by A to check the validity of the name, by an "isValid()" method on B.
"Asking" has been replaced by "telling" (calling a method that executes the computation).
For me, this is the question I ask myself when I find too many getter calls. Gradually, the methods encounter their place in the correct object, and everything gets a bit simpler, I have less getters and less call to them. I have less code, and it provides more semantic, a better alignment with the functional requirement.
Move the data around
There are other cases where I move some data. For example, if a field moves two objects up, the length of the "getter chain" is reduced by two.
I believe nobody can find the correct model at first.
I first think about it (using hand-written diagrams is quick and a big help), then code it, then think again facing the real thing... Then I code the rest, and any smells I feel in the code, I think again...
Split and merge objects
If a method on A needs data from C, with B as a middle man, I can try if A and C would have some in common. Possibly, A or a part of A could become C (possible splitting of A, merging of A and C) ...
However, there are cases where I keep the getters of course.
But it's less likely a long chain will be created.
A long chain will probably get broken by one of the techniques above.
I have three patterns for this:
Pass the necessary reference to the object's constructor -- the reference can then be stored as a data member of the object, and doesn't need to be passed again; this implies that the object's factory has the necessary reference. For example, when I'm creating a DOM, I pass the element name to the DOM node when I construct the DOM node.
Let things remember their parent, and get references to properties via their parent; this implies that the parent or ancestor has the necessary property. For example, when I'm creating a DOM, there are various things which are stored as properties of the top-level DomDocument ancestor, and its child nodes can access those properties via the reference which each one has to its parent.
Put all the different things which are passed around as references into a single class, and then pass around just that one class instance as the only thing that's passed around. For example, there are many properties required to render a DOM (e.g. the GDI graphics handle, the viewport coordinates, callback events, etc.) ... I put all of these things into a single 'Context' instance which is passed as the only parameter to the methods of the DOM nodes to be rendered, and each method can get whichever properties it needs out of that context parameter.