Related
The creator of the Clojure language claims that "open, and large, set of functions operate upon an open, and small, set of extensible abstractions is the key to algorithmic reuse and library interoperability". Obviously it contradicts the typical OOP approach where you create a lot of abstractions (classes) and a relatively small set of functions operating on them. Please suggest a book, a chapter in a book, an article, or your personal experience that elaborate on the topics:
motivating examples of problems that appear in OOP and how using "many functions upon few abstractions" would address them
how to effectively do MFUFA* design
how to refactor OOP code towards MFUFA
how OOP languages' syntax gets in the way of MFUFA
*MFUFA: "many functions upon few abstractions"
There are two main notions of "abstraction" in programming:
parameterisation ("polymorphism", genericity).
encapsulation (data hiding),
[Edit: These two are duals. The first is client-side abstraction, the second implementer-side abstraction (and in case you care about these things: in terms of formal logic or type theory, they correspond to universal and existential quantification, respectively).]
In OO, the class is the kitchen sink feature for achieving both kinds of abstraction.
Ad (1), for almost every "pattern" you need to define a custom class (or several). In functional programming on the other hand, you often have more lightweight and direct methods to achieve the same goals, in particular, functions and tuples. It is often pointed out that most of the "design patterns" from the GoF are redundant in FP, for example.
Ad (2), encapsulation is needed a little bit less often if you don't have mutable state lingering around everywhere that you need to keep in check. You still build ADTs in FP, but they tend to be simpler and more generic, and hence you need fewer of them.
When you write program in object-oriented style, you make emphasis on expressing domain area in terms of data types. And at first glance this looks like a good idea - if we work with users, why not to have a class User? And if users sell and buy cars, why not to have class Car? This way we can easily maintain data and control flow - it just reflects order of events in the real world. While this is quite convenient for domain objects, for many internal objects (i.e. objects that do not reflect anything from real world, but occur only in program logic) it is not so good. Maybe the best example is a number of collection types in Java. In Java (and many other OOP languages) there are both arrays, Lists. In JDBC there's ResultSet which is also kind of collection, but doesn't implement Collection interface. For input you will often use InputStream that provides interface for sequential access to the data - just like linked list! However it doesn't implement any kind of collection interface as well. Thus, if your code works with database and uses ResultSet it will be harder to refactor it for text files and InputStream.
MFUFA principle teaches us to pay less attention to type definition and more to common abstractions. For this reason Clojure introduces single abstraction for all mentioned types - sequence. Any iterable is automatically coerced to sequence, streams are just lazy lists and result set may be transformed to one of previous types easily.
Another example is using PersistentMap interface for structs and records. With such common interfaces it becomes very easy to create resusable subroutines and do not spend lots of time to refactoring.
To summarize and answer your questions:
One simple example of an issue that appears in OOP frequently: reading data from many different sources (e.g. DB, file, network, etc.) and processing it in the same way.
To make good MFUFA design try to make abstractions as common as possible and avoid ad-hoc implementations. E.g. avoid types a-la UserList - List<User> is good enough in most cases.
Follow suggestions from point 2. In addition, try to add as much interfaces to your data types (classes) as it possible. For example, if you really need to have UserList (e.g. when it should have a lot of additional functionality), add both List and Iterable interfaces to its definition.
OOP (at least in Java and C#) is not very well suited for this principle, because they try to encapsulate the whole object's behavior during initial design, so it becomes hard add more functions to them. In most cases you can extend class in question and put methods you need into new object, but 1) if somebody else implements their own derived class, it will not be compatible with yours; 2) sometimes classes are final or all fields are made private, so derived classes don't have access to them (e.g. to add new functions to class String one should implement additional classStringUtils). Nevertheless, rules I described above make it much easier to use MFUFA in OOP-code. And best example here is Clojure itself, which is gracefully implemented in OO-style but still follows MFUFA principle.
UPD. I remember another description of difference between object oriented and functional styles, that maybe summarizes better all I said above: designing program in OO style is thinking in terms of data types (nouns), while designing in functional style is thinking in terms of operations (verbs). You may forget that some nouns are similar (e.g. forget about inheritance), but you should always remember that many verbs in practice do the same thing (e.g. have same or similar interfaces).
A much earlier version of the quote:
"The simple structure and natural applicability of lists are reflected in functions that are amazingly nonidiosyncratic. In Pascal the plethora of declarable data structures induces a specialization within functions that inhibits and penalizes casual cooperation. It is better to have 100 functions operate on one data structure than to have 10 functions operate on 10 data structures."
...comes from the foreword to the famous SICP book. I believe this book has a lot of applicable material on this topic.
I think you're not getting that there's a difference between libraries and programmes.
OO libraries which work well usually generate a small number of abstractions, which programmes use to build the abstractions for their domain. Larger OO libraries (and programmes) use inheritance to create different versions of methods and introduce new methods.
So, yes, the same principle applies to OO libraries.
I'm using OOP to write small games with different types of characters (e.g. platformers, shooters) that do different types of things. I typically try to spread out functionality into easily manageable, simple classes (e.g. an Environment class would perform common physics calculations for all its Inhabitants, so they don't need to worry about that). But, it seems that the more I refactor these programs to align with OOP principles, the heavier my character objects get. Since they're the ones with the important data, they use their own data to perform functions on themselves. This keeps them decoupled from things outside of their realm, but makes their classes seem to grow and grow. I'm totally comfortable with breaking these character classes down into more manageable components, but I worry that having many objects onscreen that are instantiated from classes with a lot of methods will result in a slow-running game.
1) Do the number of instance methods on an object directly impact its runtime performance?
2) Am I using OOP correctly if I end up with heavy character objects?
No. Or at least mostly no, anyway.
Maybe, but probably not.
For a character-based game, it's perfectly reasonable that a character would have a lot of associated data. Efficiency is rarely affected by representing that as a single "flat" collection of primitive objects, or a tree-like collection of a few large objects, each of which (recursively) has a number of smaller constituent parts.
As far as number of methods affecting performance: the number of methods can affect cache utilization, especially if you have (for example) lots of extremely small methods, and heavily-used methods are more or less interleaved with less used ones, so you end up with a lot of cache space devoted to less-used methods because they happen to be in the same cache line with something that's used more. Being methods affects this primarily because a compiler will typically arrange methods of the same class close to each other in memory, so sharing cache lines becomes more common. At least with typical implementations, however, calling a method will be O(1), so the number of methods doesn't directly affect speed.
No, its not what methods you have in an object, but what you do with them that increases runtime cost. Ofcourse there is a limit to this, but with current hardware you can completely forget about it. However, it is often questionable to go beyond a dozen or two members in a class from a design standpoint. Splitting your objects up doesn't need to incur any significant cost, you can inline all your getters and setters, and pass values by pointers and references. The compiler can flatten all your design decisions out and mostly the code from a "heavy" class is equivalent to code from a constellation of small classes
Correctly in this context is entirely dependant on the taste of the people developing the code. The processor doesn't care about what software engineering design decisions you make. If you wan't to make you objects all encompassing and it feels right to you then do it. There might come a point where things don't feel "right" to you, at that point you might split things up.
http://en.wikipedia.org/wiki/Object-relational_impedance_mismatch
I've worked with several projects and all have they used a database centric design and it seems to work fine.
It seems that it's a new idea flourishing and now it seems fine but the value of it has yeat to be tested or am I wrong?
The idea of an object-relational mismatch comes from the problems that arise when you try and use an object-oriented programming approach backed by a relational database. The problem arises from the fact that object models typically contain hierarchies of objects which need to be shredded into and rebuilt from multiple tables, rather than storing the object as a whole.
However, the argument that normally comes up at this point is that if you haven't found a problem then it's your fault because you're not doing 'proper' object-orientation, and that you'll find the mismatch when you learn to do object-orientation 'properly'. And we all know that object-orientation is the only 'proper' development paradigm.
Oh, wait.
Many systems do not suit being modelled as object-oriented systems. In fact, for things like web applications which tend to have overall low complexity (with localised high complexity) and require high concurrency and scalability, using service-oriented and message-passing techniques can be a better option. When an application is written in this way, you tend to find that there isn't too much of an object relational mismatch because you don't use things like lazy loading and complex object hierarchies, and your objects are immutable so they don't need to be shredded back into the database.
So is there an object-relational mismatch? Yes, if you try and use object-oriented techniques with a relational database. But you can mitigate it by not using object-oriented techniques, if other approaches suit your application better.
If your domain model is simple, and has no deep inheritance, you may never feel the impedance mismatch.
As an example, let's say class Foo defines property foo. Bar subclasses Foo, and introduces a property bar. How do you store Foos and Bars in the database?
You could have a table Foo that contains fields foo and bar... and every Foo will satisfy "bar IS NULL". But if your subclass introduces a whole bunch of properties, that's wasteful.
So maybe you have TWO tables Foo and Bar. Do you copy all the columns in Foo into Bar so you can load an entire Bar in one SELECT? Or do you have to JOIN Foo with Bar to load a Bar?
The impedance refers to the fact that you have to think about all these little details of how you "shred" (to use Greg Beech's term) an object into one or more tables.
Object-relation impedance mismatch is to do with the difference between relational databases and object-oriented software models. If you don't see any mismatch, it's because your code isn't really doing OO properly.
When you start doing OO properly, and trying to map those relationships to an RDBMS, you'll understand the problem.
I 'm concern about what techniques should I use to choose the right object in OOP
Is there any must-read book about OOP in terms of how to choose objects?
Best,
Just write something that gets the job done, even if it's ugly, then refactor continuously:
eliminate duplicate code (don't repeat yourself)
increase cohesion
reduce coupling
But:
don't over-engineer; keep it simple
don't write stuff you ain't gonna need
It's not a precise recipe, just some general guidelines. Keep practicing.
P.S.
Code objects are not related to tangible real-life objects; they are just constructs that hold related information together.
Don't believe what the Java books/schools teach about objects; they're lying.
You probably mean "the right class", rather than "the right object". :-)
There are a few techniques, such as text analysis (a.k.a. underlining the nouns) and Class Responsibility Collaborator (CRC).
With "underlining the nouns", you basically start with a written, natural language (i.e. plain English) description of the problem you want to solve and underline the nouns. That gives you a list of candidate classes. You will need to perform several passes to refine it into a list of classes to implement.
For CRC, check out the Wikipedia.
I suggest The OPEN Toolbox of Techniques for full reference.
Hope it helps.
I am assuming that there is understanding of what is sctruct, type, class, set, state, alphabet, scalar and vector and relationship.
Object is a noun, method is a verb. Object members can represent identity, state or scalar value per field. Relationships between objects usually are represented with references, where references are members of objects. In cases, when relationships are complex, multidirectional, have arity greater than 2, represent some sort of grouping or containment, then relationships can be expressed as objects.
For other, broader technical reasons objects are most likely the only way to represent any form of information in OOP languages.
I am adding a second answer due to demian's comment:
Sometimes the class is so obvious
because it's tangible, but other times
the concept of object it's to abstract
like a db connector.
That is true. My preferred approach is to perform a behavioural analysis of the system (using use cases, for example), and then derive system operations. Once you have a stable list of system operations (such as PrintDocument, SaveDocument, SpellCheck, MergeMail, etc. for a word processor) you need to assign each of them to a class. If you have developed a list of candidate classes with some of the techniques that I mentioned earlier, you will be able to allocate some of the operations. But some will remain unallocated. These will signal the need of more abstract or unintuitive classes, which you will need to make up, using your good judgment.
The whole method is documented in a white paper at www.openmetis.com.
You should check out Domain-Driven Design, by Eric Evans. It provides very useful concepts in thinking about the objects in your model, what their function are in the domain, and how they could be organized to work together. It's not a cookbook, and probably not a beginner book - but then, I read it at different stages of my career, and every time I found something valuable in it...
(source: domaindrivendesign.org)
The terms are often thrown around interchangeably, and there's clearly considerable overlap, but just as often it seems implied that people see something strongly implied by saying that a system is an ORM that isn't implied by it being a DAL. What is that? What, if any, are the key points that differentiate these types of system?
For example, let's say I have some code that implements Database, Table, Column and Row classes, populating them by automatic analysis of an existing database, allowing simplified interaction and so on. It understands, enforces, and takes advantage of structural relationships between database entities, such as foreign keys. All the entity models can be subclassed to load table-specific functionality onto them.
To what extent is this a DAL? To what extent is it an ORM? Why?
ORM = Object-Relational Mapping
In an ORM, classes/objects in the application are mapped to database tables and operations for persistence, sometimes automagically.
DAL = Data-Access Layer
In a DAL, database operations are hidden behind a code facade.
An ORM is a kind of DAL, but not all DALs are ORMs.
I think an ORM is capable of mapping any set of objects to a relational database; whereas a DAL is specific to your application, and probably couldn't naturally be extended to support other objects.
Not only that, but a ORM specifically is concerned with mapping classes to/from the database entities, while a DAL may simply be a way for you to access the data in a database, without any mapping.
Any object orientated DAL connecting to any storage system that is not saving objects implements an ORM. ORM is generally understood to mean something like Hibernate, but the important thing is the handling of impedance mismatches.
[Expanded]
At a data level, an impedance mismatches occur when you are mapping data of one type (relational) into data of another (OO).
For instance, how many times have you seen a line like below in your DAL?
db.AddInParameter(dbCommand, "Name", DbType.String, name);
Or the other side
customerId = Convert.ToInt64(dr["CustomerID"].ToString());
Many issues come up when mapping your primitive data types.
At a object level, your DAL should be returning the structures you intend to use. Be it some sort of business object or just a bunch of raw data. Both your own DAL and the ORM need to handle this.
At a design level, the objects you construct are reflective of your stored data. So a structural difference can occur. These are also handled for you within ORM solutions, but you would be forced to do the same within a DAL. For example, within your OO code it would be nice to implement proper inheritance, but that does not covert easily into something relational.
I just wanted to point out that ORM is a term coined to push products that automate a lot of what you would already have to do within your DAL. The ORM solutions will make life easier and provide a large number of quality/performance benefits. But that doesn't change the fact that one of the major components of your DAL is creating your own ORM.
ORM didn't exist when I started programming. When the first ORMs came out, they were external tools used to create the DAL. Now days, DAL and ORM have intermingled. That's why a lot of developers use the terms interchangeably.
The most well known example of an ORM that functions as a DAL is NHibernate. Other examples are Subsonic and CSLA.NET. These are all .NET tools. IIRC, ORM tools started in the Java world. Other technologies stacks then copied what Java has done.