Why do C# and Java bother with the "new" operator? - language-design

Why does the new operator exist in modern languages such as C# and Java? Is it purely a self documenting code feature, or does it serve any actual purpose?
For instance the following example:
Class1 obj = new Class1();
Class1 foo()
{
return new Class1();
}
Is as easy to read as the more Pythonesque way of writing it:
Class1 obj = Class1();
Class1 foo()
{
return Class1();
}
EDIT: Cowan hit the nail on the head with the clarification of the question: Why did they choose this syntax?

It's a self documenting feature.
It's a way to make it possible to name a method "Class1" in some other class

Class1 obj = Class1();
In C# and Java, you need the "new" keyword because without it, it treats "Class1()" as a call to a method whose name is "Class1".

The usefulness is of documentation - it's easier to distinguish object creations from method invocations than in Python.
The reason is historic, and comes straight from the C++ syntax.
In C++, "Class1()" is an expression creating a Class1 instance on the stack. For instance:
vector a = vector();
In this case, a vector is created and copied to the vector a (an optimizer can remove the redundant copy in some cases).
Instead, "new Class1()" creates a Class1 instance on the heap, like in Java and C#, and returns a pointer to it, with a different access syntax, unlike Java and C++. Actually, the meaning of new can be redefined to use any special-purpose allocator, which still must refer to some kind of heap, so that the obtained object can be returned by reference.
Moreover, in Java/C#/C++, Class1() by itself could refer to any method/function, and it would be confusing. Java coding convention actually would avoid that, since they require class names to start with a upper case letter and method names to start with a lower case one, and probably that's the way Python avoids confusion in this case. A reader expects "Class1()" to create an object, "class1()" to be a function invocation, and "x.class1()" to be a method invocation (where 'x' can be 'self').
Finally, since in Python they chose to make classes be objects, and callable objects in particular, the syntax without 'new' would be allowed, and it would be inconsistent to allow having also another syntax.

The new operator in C# maps directly to the IL instruction called newobj which actually allocates the space for the new object's variables and then executes the constructor (called .ctor in IL). When executing the constructor -- much like C++ -- a reference to the initialized object is passed in as an invisible first parameter (like thiscall).
The thiscall-like convention allows the runtime to load and JIT all of the code in memory for a specific class only one time and reuse it for every instance of the class.
Java may have a similar opcode in its intermediate language, though I am not familiar enough to say.

C++ offers programmers a choice of allocating objects on the heap or on the stack.
Stack-based allocation is more efficient: allocation is cheaper, deallocation costs are truly zero, and the language provides assistance in demarcating object lifecycles, reducing the risk of forgetting to free the object.
On the other hand, in C++, you need to be very careful when publishing or sharing references to stack-based objects because stack-based objects are automatically freed when the stack frame is unwound, leading to dangling pointers.
With the new operator, all objects are allocated on the heap in Java or C#.
Class1 obj = Class1();
Actually, the compiler would try to find a method called Class1().
E.g. the following is a common Java bug:
public class MyClass
{
//Oops, this has a return type, so its a method not a constructor!
//Because no constructor is defined, Java will add a default one.
//init() will not get called if you do new MyClass();
public void MyClass()
{
init();
}
public void init()
{
...
}
}
Note: "all objects are allocated on the heap" does not mean stack allocation is not used under the hood occasionally.
For instance, in Java, Hotspot optimization like escape analysis uses stack allocation.
This analysis performed by the runtime compiler can conclude for example that an object on the heap is referenced only locally in a method and no reference can escape from this scope. If so, Hotspot can apply runtime optimizations. It can allocate the object on the stack or in registers instead of on the heap.
Such optimization though is not always considered decisive...

The reason Java chose it was because the syntax was familiar to C++ developers. The reason C# chose it was because it was familiar to Java developers.
The reason the new operator is used in C++ is probably because with manual memory management it is very important to make it clear when memory is allocated. While the pythonesque syntax could work, it makes is less obvious that memory is allocated.

The new operator allocates the memory for the object(s), which is it's purpose; as you say it also self documents which instance(s) (i.e. a new one) you're working with

As others have noted, Java and C# provide the new syntax because C++ did. And C++ needed some way to distinguish between creating an object on the stack, creating an object on the heap, or calling a function or method that returned a pointer to an object.
C++ used this particular syntax because the early object-oriented language Simula used it. Bjarne Stroustrup was inspired by Simula, and sought to add Simula-like features to C. C had a function for allocating memory, but didn't guarantee that a constructor was also called.
From "The Design and Evolution of C++," 1994, by Bjarne Stroustrup, page 57:
Consequently, I introduced an operator to ensure that both allocation and initialization was done:
monitor* p = new monitor;
The operator was called new because that was the name of the corresponding Simula operator. The new operator invokes some allocation function to obtain memory and then invokes a constructor to initialize that memory. The combined operation is often called instantiation or simply object creation: it creates an object out of raw memory.
The notational convenience offered by operator new is significant. ..."

In addition to remarks above AFAIK they were planning to remove new keyword for Java 7 in early drafts. But later they cancelled it.

Related

Do Objective-C objects get their own copies of instance methods?

I'm new to Objective-C and was wondering if anyone could provide any information to clarify this for me. My (possibly wrong) understanding of object instantiation in other languages is that the object will get it's own copies of instance variables as well as instance methods, but I'm noticing that all the literature I've read thus far about Objective-C seems to indicate that the object only gets copies of instance variables, and that even when calling an instance method, program control reverts back to the original method defined inside the class itself. For example, this page from Apple's developer site shows program flow diagrams that suggest this:
https://developer.apple.com/library/mac/documentation/cocoa/conceptual/ProgrammingWithObjectiveC/WorkingwithObjects/WorkingwithObjects.html#//apple_ref/doc/uid/TP40011210-CH4-SW1
Also in Kochan's "Programming in Objective-C", 6th ed., pg. 41, referring to an example fraction class and object, the author states that:
"The first message sends the setNumerator: message to myFraction...control is then sent to the setNumerator: method you defined for your Fraction class...Objective-C...knows that it's the method from this class to use because it knows that myFraction is an object from the Fraction class"
On pg. 42, he continues:
"When you allocate a new object...enough space is reserved in memory to store the object's data, which includes space for its instance variables, plus a little more..."
All of this would seem to indicate to me that there is only ever one copy of any method, the original method defined within the class, and when calling an instance method, Objective-C simply passes control to that original copy and temporarily "wires it" to the called object's instance variables. I know I may not be using the right terminology, but is this correct? It seems logical as creating multiple copies of the same methods would be a waste of memory, but this is causing me to rethink my entire understanding of object instantiation. Any input would be greatly appreciated! Thank you.
Your reasoning is correct. The instance methods are shared by all instances of a class. The reason is, as you suspect, that doing it the other way would be a massive waste of memory.
The temporary wiring you speak of is that each method has an additional hidden parameter passed to it: a pointer to the calling object. Since that gives the method access to the calling object, then it can easily access all of the necessary instance variables and all is well. Note that any static variable exists in only a single instance as well and if you are not aware of that, unexpected things can happen. However, regular local variables are not shared and are recreated for each call of a method.
Apple's documention on the topic is very good so have a look for more info.
Just think of a method as a set of instructions. There is no reason to have a copy of the same method for each object. I think you may be mistaken about other languages as well. Methods are associated with the class, not individual objects.
Yes, your thinking is more or less right (although it's simpler than that: behind the scenes in most such languages methods don't need to be "wired" to anything, they just take an extra parameter for self and insert struct lookups before references to instance variables).
What might be confusing you is that not all languages work this way, in their implementations and semantically. Object-oriented languages are (very roughly) divided into two camps: class-based, like Objective-C; and prototype-based, like Javascript. In the second camp of languages, a method or procedure really is an object in its own right and can often be assigned directly to an object's instance variables as well - there are no classes to lookup methods from, only objects and other objects, all with the same first-class status (this is an oversimplification, good languages still allow for sharing and efficiency).

How are data members stored in an object?

I know that in Objective C, every object has first 4 bytes [depending upon type of processor ] as an isa pointer stored in it that tells which class it belongs to and what dispatch table to use to resolve a selector to address of a function.
What I wanted to know was , how are data members stored and accessed in these methods.
self is passed as an implicit object in each function being called.
We use setters n getters to handle data members in other member function as a good practice,
but when we directly refer to a data member in an initializer or an accesor, how are they accessed. Are they replaced by some address at compile time or something else ?
Actually afaik the memory layout is implementation specific, but http://algorithm.com.au/downloads/talks/objective-c-internals/objective-c-internals.pdf should give you a pretty good idea of the inner works of object data and object messaging.
When you use a direct member access, what basically happens is that you're fetching straight from the "struct" that is your actual object. That is, the compiler is basically just adding an offset to the address of your object/struct and reading the contents of that memory address.
Maybe I should add that this is reverse engineered from XCode and not written in any specification I can find, so depending on this behavior is most likely a bad idea. Since external access to the iVars is not allowed, the decision is basically up to the compiler and could be changed at any time.
Edit: as #FrederickCheung points out, Objective C 2.0 may have changed this behavior.
It's not as simple as a compile time offset calculation, at least not in objective C 2.0 on the 64bit OS X and iOS runtimes. These support stuff like superclasses changing their instance variable layout without breaking subclasses that were compiled against the old layout by adding a layer of indirection.
The runtime api docs describe the API one can use to set instance variables and so on but doesn't elaborate on their implementation.

Why is everything a pointer in Objective-C

I come from a PHP/Javascript background where things are stored in a variable directly in most cases, where we also had Object/Classes/Methods, etc. It was OOP.
Now I'm starting to learn Objective-C. I understand the basics of pointers. But everything is a pointer now. This is the part that I don't get. Why aren't we like in PHP/Javascript with direct assignment? We are still doing OOP afterall.
Thanks
If you look at the semantics of JavaScript and many other OO languages (perhaps including PHP, but I'm not sure and not willing to guess), you'll see that these languages offer the same indirection Objective C offers through pointers. In fact, internally these languages use pointers everywhere. Consider this (JavaScript) snippet:
function f(obj) {
obj.x = 1; // modifies the object referred to directly
obj = {x: 2}; // doesn't affect caller
}
var foo = {x: 0};
f(foo); // passes a pointer/"reference"
// foo.x === 1
It's roughly equivalent to (C as I don't know Objective C) something like this, modulo manual memory management, static typing, etc.:
struct Obj { int x; };
void f(struct Obj *obj) {
obj->x = 1;
obj = ...; // you get the idea
}
struct Obj *foo = malloc(sizeof(*foo));
foo->x = 0;
f(foo);
free(foo);
It's just explicit in Objective C because that language's a superset of C (100% backwards compability and interoperability), while other languages have done away with explicit pointers and made the indirection they need implicit.
In PHP you also work only with pointers but transparently.
Really you using references to objects
The reason why the designers of Objective-C decided to go with using pointers on everything that is an Objective-C object include the following:
So they can deal with behind the scenes memory management without taking away the programmers ability to do so on his own.
Fast Enumeration on objects.
(Perhaps the most important) Gives the ability to have id types that can pass nil(null) values without crashing the program.
To build on the other answers here: in PHP and other languages you are still using pointers. That is why there is still a distinction between passing by reference and passing by value. There are several good sites that help explain the distinction, both in syntax and what it means to pass by either method.
Edit:
Refer to the second link in my post. My interpretation of that information is that PHP passes by value by default. Adding the ampersand in front of the variable during the function call passes a reference (or rather the address of the variable). In essence, passing by reference is passing a pointer while passing by value does a copy of the value completely. They also have different implications on their usage (reference allows modifying the original variable's value, even outside the scope of the function etc).
Objective C is a strict superset and extension of ANSI C, so the native types that could be compatibly added to the language were constrained (perhaps by the original implementation). But this compatibility with ANSI C has turned out to be one of the advantages of using Objective C mixed with the reuse of cross-platform C code.
BTW, OOP and "safety" are nearly orthogonal concepts. They each have different potential costs in terms of consuming CPU cycles and/or eating the user's battery power.
Objects are created using the +alloc method, which allocates space for the new object on the heap. In C, and therefore in Objective-C, the only way to refer to memory on the heap is through a pointer.

How does an Objective-C method have access to the callee's ivars?

I was reading Apple's documentation, The Objective-C Programming Language (PDF link). On pg. 18, under The Receiver’s Instance Variables, I saw this.
A method has automatic access to the receiving object’s instance
variables. You don’t need to pass them to the method as parameters.
For example, the primaryColor method illustrated above takes no
parameters, yet it can find the primary color for otherRect and return
it. Every method assumes the receiver and its instance variables,
without having to declare them as parameters.
This convention simplifies Objective-C source code. It also supports
the way object-oriented programmers think about objects and messages.
Messages are sent to receivers much as letters are delivered to your
home. Message parameters bring information from the outside to the
receiver; they don’t need to bring the receiver to itself.
I am trying to better understand what they are describing; is this like Python's self parameter, or style?
Objective-C is a strict superset of C.
So Objective-C methods are "just" function pointers, and instances are "just" C structs.
A method has two hidden parameters. The first one is self(the current instance), the second _cmd (the method's selector).
But what the documentation is describing in page 18 is the access to the class instance variables from a method.
It just says a method of a class can access the instance variables of that class.
It's pretty basic from an object-oriented perspective, but not from a C perspective.
It also say that you can't access instance variables from another class instance, unless they are public.
While I would not say that it is a "slam" against Python, it is most certainly referring to the Python style of Object Orientation (which, in honesty, is derived from the "pseudo-object orientation" available in C (whether it is truly OO or not is a debate for another forum)).
It is good to remember that Python has a very different concept of scope from the rest of the world — each method more or less exists in its own little reality. This is contrasted with more "self-aware" languages which either have a "this" variable or an implicit instance construct of some form.

Objective-c: Objects by value / Structs with methods / How can I get something like that?

I'm starting to code in objective-c and I've just realized that objects can only be passed by reference.
What if I need an object to use static memory by default and to be copied instead of referenced?
For example, I have an object Color with 3 int components r, g and b. I dont want these objects to be in dynamic memory and referenced when passing to functions, I want them immutable and to be copied like an int or a float.
I know I can use a c struct, but I also need the object Color to have methods that gets/sets lightness, hue, saturation, etc. I want my code to be object oriented.
Is there any solution to this?
EDIT: If for example I'm building a 3d game engine, where I'll have classes like Vector2, Vector3, Matrix, Ray, Color, etc: 1) I need them to be mutable. 2) The size of the objects is roughly the same size of a pointer, so why would I be copying pointers when I can copy the object? It would be simpler, more efficient, and I wouldnt need to manage memory, specially on methods that returns colors. And In the case of a game engine, efficiency is critical.
So, if there is no solution to this... Should I use c-structs and use c-function to work on them? Isn't there a better choice?
Thanks.
You can't do this. This isn't how Objective-C works (at least the Apple/GNU version*). It simply isn't designed for that sort of extreme low-level efficiency. Objects are allocated in dynamic memory and their lifetimes are controlled by methods you call on them, and that's just how it works. If you want more low-level efficiency, you can either use plain C structs or C++. But keep in mind that worrying about this is pointless in 99% of circumstances — the epitome of premature optimization. Objective-C programs are generally very competitive with C++ equivalents both in execution speed and memory use despite this minor inefficiency. I wouldn't go for a more difficult solution until profiling had proved it to be necessary.
Also, when you're new to Objective-C, it's easy to psych yourself out over memory management. In a normal Cocoa (Touch) program, you shouldn't need to bother about it too much. Return autoreleased objects from methods, use setters to assign objects you want to keep around.
*Note: There was an old implementation of Objective-C called the Portable Object Compiler that did have this ability, but it's unrelated to and incompatible with the Objective-C used on Macs and iOS devices. Also, the Apple Objective-C runtime includes special support for Blocks to be allocated on the stack, which is why you must copy them (copy reproduces the block in dynamic memory like a normal object) if you want to store them.
What if I need an object to use static memory by default and to be copied instead of referenced?
You don't.
Seriously. You never need an object to use static memory or be allocated on the stack. C++ allows you to do it, but no other object oriented language I know does.
For example, I have an object Color with 3 int components r, g and b. I dont want these objects to be in dynamic memory and referenced when passing to functions, I want them immutable and to be copied like an int or a float.
Why do you not want the objects to be in static memory? What advantage do you think that gives you?
On the other hand it's easy to make Objective-C objects immutable. Just make the instance variables private and don't provide any methods that can change them once the object is initialised. This is exactly how the built in immutable classes work e.g. NSArray, NSString.
One solution that people use sometimes is to use a singleton object (assuming you only need one of the objects for your entire app's lifetime). In that case, you define a class method on the class and have it return an object that it creates once when it is first requested. So you can do something like:
#implementation MyObject
+ (MyObject *)sharedObjectInstance
{
static MyObject *theObject=nil;
if (theObject==nil)
{
theObject = [[MyObject alloc] init];
}
return theObject;
}
#end
Of course the object itself isn't what's being statically allocated, it's the pointer to the object that's statically allocated, but in any case the object will stick around until the application terminates.
There are times when you want to do this because you really only want one globally shared instance of a particular object. However, if that's not your objective, I'm not sure why you'd want to do what you're describing. You can always use the -copy method to create a copy of an object (assuming the object conforms to the NSCopying protocol) to manipulate without touching the original.
EDIT: Based on your comments above it seems you just want to have immutable objects that you can copy and modify the copies. So using -copy is probably the way to go.