I was writing a class file and I included a CGPoint as an Ivar. This got me wondering about the overhead associated with smaller objective-c data structures. Is the memory footprint of something like a CGPoint significant enough to justify making a pointer to it, or would I just be making a pointer to 2 CGfloat values? For that matter, if all I need are x/y coordinates why not just stick 2 ints in as ivars?
On a related note, is there a nomenclature for describing tiny data structures, like "petty data structures", or "trivial data structures"; a word that describes a a struct made of a few primitives.
There are certainly issues with small objects, but there are supposedly several small-object optimizations in the system already.
In general, if you need an object, use an object. If not, then don't.
However, the bigger issue is to write your code so that it is easy to read by humans, and easy to maintain by humans. Use performance tools (like Instruments) to isolate places where system resource utilization needs to be addressed, and only then address those issues.
Of course, there are obvious stupid things to avoid, but in general, focus on a clean design, and easy to read/change implementation. Running the performance tools on your test suite should easily spot anything too wrong.
First don't worry about the overhead until you have a performance issue.
Most (but not all) object-oriented languages make a distinct between small, non-object, types and larger, object-based, ones; but the boundary is fuzzy. A useful distinction to decide how to represent a type is whether you think of it as a simple value, which you probably think as operated on or calculated with; or something more involved, which may be something which has inherent behaviour, operates on etc.
The built-in primitive types are mostly values: integers, characters, etc. In that vein complex numbers, fractions, coordinates etc. are simple values - use structures; trees, stacks, hierarchies are not - use objects.
On your related note consider many use "value", you can also consider "composite". "Petty" isn't a good choice, wrong connotations ;-) Others might say "basic", "trivial", "structure" or "record" - while the latter two can also be used to refer to very large types they often are used for small ones.
Related
I am making a Mathematics web program which allows the user to compute and prove various quantities or statements, e.g. determinant of a matrix, intersection of sets, determine whether a given map is a homomorphism. I decided to write the code using the OOP paradigm (in PHP, to handle some of the super heavy computations that a user's browser might not appreciate), since I could easily declare sets as Set objects, matrices as Matrix objects, etc. and keep some of the messy details of determining things such as cardinality, determinants, etc. in the background. However, after getting knee-deep in code, I'm wondering if deciding on OOP was a mistake. Here's why.
I'll use my Matrix class as a simple example. Matrix has the following attributes:
name (type String) (stores name of this matrix)
size (type array) (stores # rows and # columns of this matrix)
entries (type array) (stores this matrix's entries)
is_invertible (type Boolean) (stores whether this matrix can be inverted)
determinant (type Int) (stores the determinant of this matrix)
transpose (type array) (stores the transpose of this matrix)
Creating a new matrix called A would be done like so:
$A = new Matrix("A");
Now, in a general math problem concerning matrices, it could be that we know the matrix's name, size, entries, whether it's invertible, its determinant, or its transpose, or any combination of the above. This means that all of these properties need to be accessible, and certainly any of these properties can be changed by the user, depending on what's given in the problem. (I can give examples of problems for any of these cases, if needed.)
The issue I'm having, then, is that this would break the encapsulation "rule" of OOP ("rule" in quotes since, from what I understand, it's not a hard-and-fast rule, just one that should be upheld to the greatest extent possible). I did some searching on when getters and setters should be used, or even IF they should be used (seems odd to me that they wouldn't, in an OOP setting...), but this did not seem to help me much, as I found many contradictory answers and case-specific opinions.
So, my overall questions are: when the user needs access to modify many (if not all) of an object's attributes, but a class-oriented design seems to be ideal for addressing the programming problem,
Is OOP the best way to structure the code, despite essentially ignoring encapsulation altogether?
Is there an alternative to OOP which allows high user access while maintaining the OO "flavor" (i.e. keeping sets, matrices, etc. as objects)
Is it ok to break the encapsulation rule altogether once in a while, if the problem calls for it? Or is that not in the spirit of OOP?
What you are trying to do is not necessarily outside the scope of OOP. The thing is that you have a different model than what would usually be described in programming textbooks (where, for example, the values of the matrix would be always present and all of the functions could be simple methods). (Perhaps this is why the question was unfairly downvoted.) Nothing prevents you from storing values like "is_invertible" internally and implementing setter and getter methods. Doing this might make sense if you are trying to learn OOP. But I think other problems (see coding textbooks) might be easier for learning purposes. I see that a remote goal would be to capture some of mathematics as an OOP framework. But the whole mathematical universe is immensely richer than any fixed architecture (results like Gödel's theorem put a theoretical limit). You can only succeed in developing a framework for a very narrow application, for example solving certain equations. That's what symbolic algebra programs do: you can look at how, for example, SymPy or perhaps parts of Maple and Mathematica are implemented. In my view, the OOP paradigm can be both very useful and too restrictive / unnecessary depending on the task (you can certainly find more about shorcomings of OOP in Wikipedia or elsewhere). Also, your problem can be seen as writing a small programming language - in many of them you have sets, numbers, etc as objects.
You can use only rudimentary OOP or no OOP at all. You can use functional programming.
You should Google/read more about this on this or other sites. Is it OK to sometimes walk across the road when the red traffic light is on?
I have a very CPU intensive F# program that depends on persistent data-structures - about 40% of the total CPU time is spent in the Map module. So I thought I'd try out the PersistentHashMap in FSharpX collections. (BTW, this is already a big improvement over the previous version of F# in VS2013 where the same program spent 70% of its time in Map. I also notice that running programs with the debugger attached doesn't have the huge penalty it did before - good work guys...) There is also a hot-spot where I'm re-sorting all the time, where instead I should be adding to a Heap, so I thought I'd give that a go as well.
Two issue became immediately apparent:
(1) Swapping out one for the other from an interface perspective proved harder than it seems it should - I.e., making a shim that let me switch from a Map to a PersistentMap, preserving both the needed module-based let-bound functions and Types necessary to use the each map. I know that having full HM type-inference (and no type-classes) is orthogonal to LSP-style referential transparency for the most part - but maybe I was missing some way to do this better with a minimal amount of code.
(2) The biggest problem (which I'd like to focus on here) is the reliance of the F# functional data-structs on oo-style dispatched equality and comparison via the IComparison (when 't : comparison), etc., family of interfaces.
Even for OO programs ISTM that the idea of dispatching equality and comparison is a bad idea -- an object "knows" how to perform its own domain-specific tasks, but it doesn't "know" for the most part what notion of equality is going to be necessary at various points in the program for various purposes -- so equality/comparison should not be part of the object's interface, but when these concepts are needed, they should always be mentioned explicitly. For example, there should never be a .Sort(), only a .SortWith(...). One could argue that even something as basic as structural equality in F# could be explicit a.StructEq(b) or a ~= b - otherwise you always get object.Equals -- but even stipulating that doing things this way is the best for a multi-paradigm language that's a first-class .Net citizen, it seems like there should at least be the option of using passed-in comparison and equality functions, but this is not the case.
This means that: (a) type constraints are enforced even if you don't want them, causing ripples of broken inferred typing (and hundreds of wavy red lines with it being unclear where the actual "problem" is) and (b), that by implementing a notion of equality or comparison that makes one container type happy in one part of your program (and in my case I want to use the same container and item type with two different notions of ordering in two different places), it is likely to silently break (or cause inefficiency, if one subsumes the other) in other parts of the code that depended on the default/previous implementation.
The only way around this that I could think of is wrapping each item a adapter object using new...with object expression - but I really don't want to create so much garbage just to get the code to work.
So, ISTM that we could have a "pure" version of each persistent data struct that could be loaded if desired (even basics like List, etc.) that do not depend on dispatched equality/comparison/hashing and do not impose type constraints - all such needs should be via a passed in fn's at the time of the call. (Dispatched eq/cmp would be only for used for interop with BCL collections that don't accept delegates.) Then we could have a [EqCmpHashThrowNotImplemented] attribute, and I could be sure that there were no default operations happening at all, and I would feel better about the efficiency and predictability of my code. (And this also let's one change from a Record to a Class or visa-versa w/o worrying about any changes in behavior due to default implementations.) Again, this would be optional, but done by with a simple import. (Which does mean that each base core collection type would have to be broken out into its own module, which isn't really a bad idea anyway.)
If I've overlooked a better way to do things or there are some patterns people are using here, I'd be interested.
I'm just learning to program in scala.
I have some experience in functional programming, as I have in object oriented programming.
My question is kind of simple, yet tricky:
Which structures should be used in Scala? Should we only stick to immutables, eg. modifing lists by iterating through it and stick a new one together, or go for mutables? What is your opinion on that, what are the performance aspects, memory related aspects, ...
I'm likely to program in a functional style, but it often expands to an insane amount of effort to do things which are easily done by using mutables. Is it situation dependent, what to use?
Prefer immutable to mutable state. Use mutable state only where it is absolutely necessary. Some notable reasons include:
Performance. The standard libraries make wide use of vars and while loops, even though this is not idiomatic Scala. This should not be emulated, however, except for cases where you have profiled to determine that modifying the code to be more imperative will bring a significant performance gain.
I/O. I/O, or interacting with the outside world is inherently state dependent, and thus must be dealt with in a mutable manner.
This is no different than the recommended coding style found in all major languages, imperative or functional. For example, in Java it is preferable to use data objects with only private final fields. Code written in an immutable (and functional) way is inherently easier to understand because when one sees a val, they know it will never change, reducing the possible number of states any particular object or function can be in.
In many cases, it also allows automatic parallel execution, for example, collection classes in Scala all have a par function, which will return a parallel collection that automatically run the calls to functions like map or reduce in parallel.
(I thought this must be a duplicate but couldn't easily find an earlier similar one, so I venture to answer...)
There is no general answer to this question. The rule of thumb suggested by the creators of Scala is to start with immutable vals and structures and stick to them as long as it makes sense. You can almost always create a workable solution to your problem this way. But if not, of course be pragmatic and use mutability.
Once you have a solution, you can tweak it, test it, measure its performance etc. If you find that e.g. it is too slow or overly complex, identify the critical part of it, understand what makes it problematic and - if needed - reimplement it using mutable variables, ideally keeping it isolated from the rest of the program. Note though that in many cases, a better solution can be found from within the immutable realm as well, so try looking there first. Especially for a beginner like myself, it still happens regularly that the best solution I could come up with looked contorted and complex with no apparent way to improve it - until seeing a simple and elegant solution to the same problem in a few lines of code, created by an experienced Scala developer who controls more of the power of the language and its libraries.
I usually obey the following rules:
Never use static mutable vars
Keep all user defined data types (typically case classes) immutable unless they are very expensive to copy. This will simplify a lot of the application logic.
If a data structure/collection is inherently mutable (i.e. it's designed to change over time), using a mutable data structure/collection might be appropriate. An example might be a large game world that is updated when players move. Remember to (almost) never share these data structures between threads though.
It's fine to use mutable local vars in methods
Use immutable collections for function results. These can be strictly or lazily evaluated depending on what gives best performance in the used context. Be careful if you use a lazily evaluated result which depends on a mutable collection though.
I'm using OOP to write small games with different types of characters (e.g. platformers, shooters) that do different types of things. I typically try to spread out functionality into easily manageable, simple classes (e.g. an Environment class would perform common physics calculations for all its Inhabitants, so they don't need to worry about that). But, it seems that the more I refactor these programs to align with OOP principles, the heavier my character objects get. Since they're the ones with the important data, they use their own data to perform functions on themselves. This keeps them decoupled from things outside of their realm, but makes their classes seem to grow and grow. I'm totally comfortable with breaking these character classes down into more manageable components, but I worry that having many objects onscreen that are instantiated from classes with a lot of methods will result in a slow-running game.
1) Do the number of instance methods on an object directly impact its runtime performance?
2) Am I using OOP correctly if I end up with heavy character objects?
No. Or at least mostly no, anyway.
Maybe, but probably not.
For a character-based game, it's perfectly reasonable that a character would have a lot of associated data. Efficiency is rarely affected by representing that as a single "flat" collection of primitive objects, or a tree-like collection of a few large objects, each of which (recursively) has a number of smaller constituent parts.
As far as number of methods affecting performance: the number of methods can affect cache utilization, especially if you have (for example) lots of extremely small methods, and heavily-used methods are more or less interleaved with less used ones, so you end up with a lot of cache space devoted to less-used methods because they happen to be in the same cache line with something that's used more. Being methods affects this primarily because a compiler will typically arrange methods of the same class close to each other in memory, so sharing cache lines becomes more common. At least with typical implementations, however, calling a method will be O(1), so the number of methods doesn't directly affect speed.
No, its not what methods you have in an object, but what you do with them that increases runtime cost. Ofcourse there is a limit to this, but with current hardware you can completely forget about it. However, it is often questionable to go beyond a dozen or two members in a class from a design standpoint. Splitting your objects up doesn't need to incur any significant cost, you can inline all your getters and setters, and pass values by pointers and references. The compiler can flatten all your design decisions out and mostly the code from a "heavy" class is equivalent to code from a constellation of small classes
Correctly in this context is entirely dependant on the taste of the people developing the code. The processor doesn't care about what software engineering design decisions you make. If you wan't to make you objects all encompassing and it feels right to you then do it. There might come a point where things don't feel "right" to you, at that point you might split things up.
Java forbids operator overloading, but coming from C++ I do not see any reason for that. In languages where operator symbols are symbols as any other, same rules apply to "+" as to"plus" and there is no problem. So what is the point?
Edit: To be more concrete, show me which disadvantage overloaded "+" may have over overloaded "equals".
Just as many other things in Java, this is a restriction because it may be confusing if used improperly. (Similarly as pointer arithmetic is forbidden because it is error prone.) I'm a big fan of Java, but I'm generally of the opinion that it shouldn't be forbidden just because it could be misused.
For instance, BigInteger would benefit greatly from overloading the + operator.
OK, I'll try my hand at this under the assumption that Gabriel Ščerbák is doing this for better reasons than railing against a language.
The issue for me is one of manageable complexity: How much of the code in front of me do I have to decode vs. simply read?
In most conventional languages, upon seeing the expression a + b I know what is going to happen. The variables a and b will be added together. I'm pretty confident that behind the scenes the code will be very concise, very fast native machine code that adds the two numbers, whether the numbers are short integers or double-precision or some mixture of the two. (In some languages I may have to also assume that these could be strings being concatenated, but that's a rant for an entirely different question -- but one that flavours this rant if you peer at it from the right angle.)
When I make my own user-defined type -- say the omnipresent Complex type (and why Complex isn't a standard data type in modern languages is way the Hell beyond me, but that, again, is a rant for a different question) -- if I overload an operator (or, rather, if the operator is overloaded for me -- I'm using a library, say), short of peering very closely at the code I will not know that I'm now calling (possibly-virtual) methods on objects instead of having very tight, concise code generated for me behind the scenes. I will not know of the hidden conversions, the hidden temporary variables, the ... well, everything that goes along with writing many operators. To find out what's really going on in my code I have to pay very close attention to every line and keep track of declarations that may be three screens away from my current location in the code. To say that this impedes my understanding of the code flowing before my eyes is an understatement. Important details are being lost because the syntactic sugar is making things taste too tasty.
When I'm forced to use explicit methods on the objects (or even static methods or global methods where that applies) this is a signal to me, while I'm reading, that tells me of the potential cost overheads and bottlenecks and the like. I know, without even having to think for an instant, that I'm dealing with a method, that I've got dispatching overhead, that I may have temporary object creation and deletion overhead, etc. Everything's in front of me right before my eyes -- or at least enough indicators are in front of me that I know to be more careful.
I'm not intrinsically opposed to operator overloading. There are times when it makes code clearer, yes indeed, especially when you have complicated calculations over many baffling expressions. I can understand, however, exactly why someone might not want to put that into their language.
There is a further reason not to like operator overloading from the language designer's viewpoint. Operator overloading makes for very, very, very difficult grammars. C++ is already infamous for being nigh-unparseable and some of its constructs, like operator overloading, are the cause of it. Again from the viewpoint of someone writing the language I can fully understand why operator overloading was left off as a bad idea (or a good idea that's bad in implementation).
(This is all, of course, in addition to the other reasons you've already rejected. I'll submit my own overloading of operator-,() in my old C++ days in that stew just to be really annoying.)
There is no problem with operator overloading itself, but how it's actually has been used. As long as you overload the operators to make sense, the language still makes sense, but if you give other meanings to operators, it makes the language inconsistent.
(One example is how the shift left (<<) and shift right (>>) operators has been overloaded in C++ to mean "input" and "output"...)
So, the reasoning when leaving out operator overloading was probably that the risk of misuse was greater than the benefits of having operator overloading.
I think that Java would benefit greatly from extending its operators to cover built-in Number object types. Early (pre-1.0) versions of Java were said to have it (in that there were no primitives - everything was an object) but the VM technology of the time made it prohibitive from a performance view.
But in terms of in general allowing user defined operator overloading, it is not in the spirit of the Java language. The main problem is simply that it is hard to implement an operator that is consistent with what you expect from mathematics across object types and it will open the door to a lot of bad implementations which lead to a lot of hard to find (therefore expensive) bugs. You can just look at how many bad equals implementations (as in violate the contract) there are in general Java code, and the problem would only get worse from there.
Of course there are languages that prioritize power and syntactical beauty over such concerns, and more power to them. It is just not Java.
Edit: How is a custom + operator different than a custom == implementation (captured in Java in the equals(Object) method)? It isn't, really. It is just that by allowing operator overloading, things that are intuitive to a sixth grader become untrue. The real world experience of equals(Object) implementations shows how such complex contracts become hard to enforce in the real world.
Further Edit: Let me clarify the above, as I shortened it while editing and lost the point. A + operator in math has certain properties, one of which is that it doesn't matter which order the numbers on either side appear - it has the same result. So consider even the simplest case of a + performing an add to a Collection:
Collection a = ...
Collection b = ...
a + b;
System.out.println(a);
System.out.println(b);
The intuitive understanding of + would lead to an expectation that a + b or b + a would give the same result, but of course they would not. Start mixing two object types that take each other as paramaters in their plus method (say Collection and String) and things get harder to follow.
Now certainly it is possible to design operators on objects which are well understood and lead to better, more readable and more understandable code than without them. But the point is that more often than not in home-grown corporate APIs what you would end up seeing is obfuscated code.
There are a few problems:
Overloading logical operators has side effects because of lazy evaluation.
Even in mathematical types there are ambiguities, is (3dpoint*3dpoint) a cross or scaler product
You can't define new operators, so people reuse existing operators in novel ways eg. "string1%string2" to mean split string1 on string2.
But you can't always protect idiots from themselves even with an outright ban.
The point is that whenever you see, for example, a plus sign being used in the code, you know exactly what it does given that you know the types of its operands (which you always do in Java, as it is strongly typed).