Related
I have always wondered if I am thinking ahead too much or too little before I code something. This is especially true for me if I am not sure what possible future requirement changes I will be required to account for are. I don't know how flexible or abstracted I should make my classes. I'll give a quick example.
You want to write a program that plays blackjack against a computer and you're the type of person that likes to experiment. You begin to write the code for the deck, but then you realize blackjack could have 1, 2, 4, or any number of decks. You account for that, but then you realize that maybe the deck will be altered and not have any cards of value ten. You then decide that the deck should be completely versatile to allow any number of suits or ranks. You then decide that the rules for the deck should be able to be altered from the standard number of suits multiplied by the unique ranks to equal the total amount of cards in the deck... You can see where I am going here.
My question is this, are there any guidelines for how flexible a class should be?
Favor minimalism and encapsulation, avoiding functionalities you don't need.
It's of course good to design based on needs, but cluttering designs with things you do not use -- or could possibly use in the future -- should be minimized. It's fine to consider and implement what you are sure you will need.
When you understand and specify a 'future problem' (specifically, at that point in the future), you will often solve it different from today's solution.
Check out the great paper "On the Criteria to be Used in Decomposing Systems into Modules" by David Parnas, from back in 1972.
Generally speaking, you should try to identify areas of responsibility that can be pushed behind a very simple interface that hides useful functionality and complexity. You should strive to separate the what from the how in areas you feel are most likely to change (i.e. predicting variation).
Flexibility is indeed are a requirement for an application / system to be maintainable. Usually I find that a design following SOLID design principle, TDD and stateless business logic is easier to maintain.
Among all of SOLID principle, I find that [S]RP is the rule that makes the application maintainable. Following [S]RP, your system will be broken down to smaller pieces, with replaceable classes. Say that it can be broken to Deck, DeckRule, HitAction, etc.
Interface or inheritance will help, since you can easily swap your Deck with NoTenDeck or SpadeOnlyDeck. And you can swap the DeckRule to HardToWinDeckRule or ImpossibleWinDeckRule. With decorator or other design patterns such as composite will also help to make your system flexible. Don't forget Test Unit, it will help you to refactor the code.
And also sometimes you will need something like Breaking Change in which you need to tear down your current architecture and interfaces to be replaced with another design. Sometimes it is needed, but mostly not.
You can find several discussion at stackoverflow answer for DI vs Singleton and little about state or stateless.
I try to follow the agile principle of YAGNI ! - You Ain't Gonna Need It.
It isn't worth the effort of coming up with all these possible future requirements. There are an infinite number of possible future requirements. You can't account for all of them. Just do what you need to do to fulfill the requirements you already have.
If in the future you get new requirements, THEN change the system. (you do have good tests to make sure you don't break anything during refactoring right?)
Overall thoughts on flexibility
From your description of the problem I don't think your classes should be flexible as in "keep throwing every new aspect of the game rules into the same class". A class with too many responsibilities is fragile, hard to maintain and thus hard to change - ironically, treating a class as if it were flexible will eventually make it rigid ! Don't put all your eggs in one basket. Separate concern means separate class. Your overall design should be flexible, not so much your classes.
On the blackjack problem
Card games, especially complex and/or evolving ones, are generally most peculiar animals and thus probably not a good standard example to start experimenting with when trying to improve your design skills.
If you want real modularity, you'll probably need a pluggable rules engine that allows plugins to hook at different stages of a game, giving you access to relevant resources to alter anything from scores to the sequence of events in a turn to even other rules.
My take on this is
You already know your game will evolve in the future and you're going to need such an engine. To answer the "thinking ahead" part of your question, this means you'll start with a simple standard turn structure, a minimal rules engine and incrementally add to it as you implement each feature in your backlog. The thinking ahead you shouldn't do is trying to forecast every little detail in the engine upfront. In other words you'll use YAGNI as in "I ain't gonna need this rule/type of hook into the game" rather than "I ain't gonna need a rules engine" since you know you've got to have one anyway.
Or,
It's going to be, at least at first, a one-shot fixed-rules game. You'll need less raw flexibility in the game system here. Concentrate on use cases and try to make acceptance tests pass with the simplest possible technical solution. Take one little step at a time. Implement a working solution for just 1 deck with simple rules at first, then expand to more complex areas. This hopefully will lead you to a no-nonsense, well-designed system which may or may not involve some kind of rules engine.
You might also want to have a look at https://gamedev.stackexchange.com/ for game-specific design guidelines.
When writing code I tend to look for things that may be obvious additions later on. "Obvious is probably a word to define though. :-) It means things that you are sure will be in a future release. Other than that, I try not to worry about it.
Let's say I've made a list of concepts I'll use to draw my Domain Model. Furthermore, I have a couple of Use Cases from which I did a couple of System Sequence Diagrams.
When drawing the Domain Model, I never know where to start from:
Designing the model as I believe the system to be. This is, if I am modelling a the human body, I start by adding the class concepts of Heart, Brain, Bowels, Stomach, Eyes, Head, etc.
Start by designing what the Use Cases need to get done. This is, if I have a Use Case which is about making the human body swallow something, I'd first draw the class concepts for Mouth, Throat, Stomatch, Bowels, etc.
The order in which I do things is irrelevant? I'd say probably it'd be best to try to design from the Use Case concepts, as they are generally what you want to work with, not other kind of concepts that although help describe the whole system well, much of the time might not even be needed for the current project. Is there any other approach that I am not taking in consideration here? How do you usually approach this?
Thanks
Whether DDD, or not, I would recommend with determining the ubiquitous language (UL) by interviewing the product owner(s). Establishing communication in a way that will have you and the product owners speaking the same language not only aides in communication, but being able to discuss the project in common terms tends to help the domain model define itself.
So, my answer is basically to discuss, listen, and learn. Software serves a need. Understanding the model from the viewpoint of the experts will lay the solid groundwork for the application.
I'd start by a drawing a class diagram with all the relationships and implement only the classes that are necessary according to the requirements of your application.
You can use an anemic approach (attributes plus getters and setters) to keep things simple and avoid the step of writing business logic in the same step. With an anemic model, the logic would go into a corresponding Service class. That way you can consider Use Cases later on.
I know some people don't appreciate this way of doing things but it does help with maintenance and avoids some dependency issues.
Answer to devoured elysium's question below:
In terms of analysis, starting with Use cases (What) and then proceeding to the class diagram (How) sounds like a good rule of thumb. Personally, I'd do the Sequence diagram (When and Who?) afterwards, as you'd need to know between which processes/objects messages need to be sent.
Beyond that my take on things is that UML is simply a way to model a system/project and not a methodology by itself (unlike Merise, RAD, RUP, Scrum, etc.). There is nothing stopping someone starting off with any diagram as long as they have the sufficient information to complete it. In fact, they should be done simultaneously since each of the diagrams is a different perspective of the same system/project.
So, all in all it depends on how you go about the analysis. During my studies I was taught the rigid waterfall approach, where you do a complete analysis from start to finish before producing some code. However, things can be different in practice, as the imperative might be to produce a working application in the least time possible.
For example, I was introduced to the Scrum methodology recently for an exercise involving the creation of a web site where people can post their fictions. As there was a time constraint and a clear vision of what should be achieved, we started right away with a bare bones class diagram to represent the domain model. The Use cases were then deduced from a series of mock screens we'd produced.
From memory, the classes were Story, Chapter, User and Category. This last class was phased out in favour of a more flexible Tag class. As you'd imagine, the complete class diagram of the existing project would be much more complex due to applying domain driven design and the specificities of the Java programming language.
This approach could be viewed as sloppy. However, a web site like this could easily be made in a couple of weeks using an iterative process and still be well designed. The advantage an iterative process has over the waterfall approach is that you can continually adjust requirements as you go. Frequent requirements changing is a reality, as people will often change their minds and the possibility of producing a working application after each iteration allows one to stay on course so to speak.
Of course, when you're presenting a project to a client, a complete analysis with UML diagrams and some mock screens would be preferable so they have an idea of what you're offering. This is where the UML comes in. Once you've explained some of the visual conventions, an individual should be able to understand the diagrams.
To finish off, if you're in the situation where you're trying to determine what a client wants, it's probably a good idea to gradually build up a questionnaire you can bring with you. Interviewing a person is the only way you can determine what concepts/features are really needed for an application, and you should expect to go back in order to clarify certain aspects. Another tip would be to do some quick research on the web when you're confronted with a subject matter you're unfamiliar with.
In your example, this would be to go through the basics of anatomy. Among other things, this will help you decide what the model should contain and what granularity it should have (What group of organs should be considered? How precise does it need to be? Do only the organs need to modeled or should they be decomposed into their constituents like tissues, cells, chemical composition, etc. ?).
I think the place to start would be whatever feels logical and comfortable. It's probably best to start with the use cases, as they give you clear direction and goals, and help you avoid YAGNI situations. Given that you should be trying to develop a strong domain model, it shouldn't really matter, as the whole picture of the domain is important.
I would like to share my experience for such type of situations. I usually start with writing tests and code. And try to cover one end to end use case. This gives me fair enough idea about problem and at the end I also have something working with me which I can show case to my client. Most of the time subsequent stories build on top of previous one, but it also happens to me that subsequent stories require changes in the previous model I came up with. But this does not impact me as I already have good test coverage. In this way I came up with the model which fits for the current problem, not the model which maps the real world.
You start with Business Requirements which can be formalized or not. If formalized you would use Use Case Diagrams.
For example here are use case diagrams for an e-commerce app:
http://askuml.com/blog/e-commerce/
http://askuml.com/files/2010/07/e-commerce-use-case.jpg
http://askuml.com/files/2010/07/e-commerce-use-case2b.jpg
From these use cases, you can naturally deduce the business entities: product, category of product, shopping cart, ... that is start to prepare class diagrams.
This is best practice in many methodologies but this is also just common sense and natural.
Short answer
Pick a use case, draw some collaboration diagram (and a class diagram) to realize the domain objects involved. Concentrate only on those objects participated in order to accomplish use case goal. Write TDD test case to set the expectation and gradually model your domain classes to meet the expectations. TDD is very helpful to understand the expected behaviors and it helps to get the cleaner domain model. You will see your domain evolve gradually along with the TDD expectations.
Long answer
My personal experience with DDD was not easy. That was because we didn't have necessary foundations in the first place. Our team had many weak points in different areas; requirements were not captured properly and we only had a customer representative who was not really helpful (not involved). We didn't have a proper release plan and developers had a lack of Object Oriented concepts, best principles and so on. The major problem we had was spending so much time on trying to understand the domain logic. We sketched many class diagrams and we never got the domain model right, so we stopped doing that and found out what went wrong. The problem was that we tried too hard to understand the domain logic and instead of communicating we made assumptions on the requirements. We decided to change our approach, we applied TDD, we started writing the expected behavior and coded the domain model to meet the TDD's expectations. Sometimes we got stuck writing TDD test cases because we didn't understand the domain. We straight away talked to the customer representative and tried to get more input. We changed our release strategy; applied agile methodology and release frequently so that we got real feedback from the end user. However, needed to ensure the end user expectation was set at the right level. We refactored based on the feedback, and in that way the domain model evolved gradually. Subsequently, we applied design patterns to improve reusability and maintainability. My point here is that DDD alone cannot survive, we have to build the ecosystem that embraces the domain, developers must have strong OOP concepts and must appreciate TDD and unit test. I would say DDD sits on top of all the OOP techniques and practices.
From what I understand, OOP is the most commonly used paradigm for large scale projects. I also know that some smaller subsets of big systems use other paradigms (e.g. SQL, which is declarative), and I also realize that at lower levels of computing OOP isn't really feasible. But it seems to me that usually the pieces of higher level solutions are almost always put together in a OOP fashion.
Are there any scenarios where a truly non-OOP paradigm is actually a better choice for a largescale solution? Or is that unheard of these days?
I've wondered this ever since I've started studying CS; it's easy to get the feeling that OOP is some nirvana of programming that will never be surpassed.
In my opinion, the reason OOP is used so widely isn't so much that it's the right tool for the job. I think it's more that a solution can be described to the customer in a way that they understand.
A CAR is a VEHICLE that has an ENGINE. That's programming and real world all in one!
It's hard to comprehend anything that can fit the programming and real world quite so elegantly.
Linux is a large-scale project that's very much not OOP. And it wouldn't have a lot to gain from it either.
I think OOP has a good ring to it, because it has associated itself with good programming practices like encapsulation, data hiding, code reuse, modularity et.c. But these virtues are by no means unique to OOP.
You might have a look at Erlang, written by Joe Armstrong.
Wikipedia:
"Erlang is a general-purpose
concurrent programming language and
runtime system. The sequential subset
of Erlang is a functional language,
with strict evaluation, single
assignment, and dynamic typing."
Joe Armstrong:
“Because the problem with
object-oriented languages is they’ve
got all this implicit environment that
they carry around with them. You
wanted a banana but what you got was a
gorilla holding the banana and the
entire jungle.”
The promise of OOP was code reuse and easier maintenance. I am not sure it delivered. I see things such as dot net as being much the same as the C libraries we used to get fro various vendors. You can call that code reuse if you want. As for maintenance bad code is bad code. OOP did not help.
I'm the biggest fan of OOP, and I practice OOP every day.
It's the most natural way to write code, because it resembles the real life.
Though, I realize that the OOP's virtualization might cause performance issues.
Of course that depends on your design, the language and the platform you chose (systems written in Garbage collection based languages such as Java or C# might perform worse than systems which were written in C++ for example).
I guess in Real-time systems, procedural programming may be more appropriate.
Note that not all projects that claim to be OOP are in fact OOP. Sometimes the majority of the code is procedural, or the data model is anemic, and so on...
Zyx, you wrote, "Most of the systems use relational databases ..."
I'm afraid there's no such thing. The relational model will be 40 years old next year and has still never been implemented. I think you mean, "SQL databases." You should read anything by Fabian Pascal to understand the difference between a relational dbms and an SQL dbms.
" ... the relational model is usually chosen due to its popularity,"
True, it's popular.
" ... availability of tools,"
Alas without the main tool necessary: an implementation of the relational model.
" support,"
Yup, the relational model has fine support, I'm sure, but it's entirely unsupported by a dbms implementation.
" and the fact that the relational model is in fact a mathematical concept,"
Yes, it's a mathematical concept, but, not being implemented, it's largely restricted to the ivory towers. String theory is also a mathematical concept but I wouldn't implement a system with it.
In fact, despite it's being a methematical concept, it is certainly not a science (as in computer science) because it lacks the first requirement of any science: that it is falsifiable: there's no implementation of a relational dbms against which we can check its claims.
It's pure snake oil.
" ... contrary to OOP."
And contrary to OOP, the relational model has never been implemented.
Buy a book on SQL and get productive.
Leave the relational model to unproductive theorists.
See this and this. Apparently you can use C# with five different programming paradigms, C++ with three, etc.
Software construction is not akin to Fundamental Physics. Physics strive to describe reality using paradigms which may be challenged by new experimental data and/or theories. Physics is a science which searches for a "truth", in a way that Software construction doesn't.
Software construction is a business. You need to be productive, i.e. to achieve some goals for which someone will pay money. Paradigms are used because they are useful to produce software effectively. You don't need everyone to agree. If I do OOP and it's working well for me, I don't care if a "new" paradigm would potentially be 20% more useful to me if I had the time and money to learn it and later rethink the whole software structure I'm working in and redesign it from scratch.
Also, you may be using another paradigm and I'll still be happy, in the same way that I can make money running a Japanese food restaurant and you can make money with a Mexican food restaurant next door. I don't need to discuss with you whether Japanese food is better than Mexican food.
I doubt OOP is going away any time soon, it just fits our problems and mental models far too well.
What we're starting to see though is multi-paradigm approaches, with declarative and functional ideas being incorporated into object oriented designs. Most of the newer JVM languages are a good example of this (JavaFX, Scala, Clojure, etc.) as well as LINQ and F# on the .net platform.
It's important to note that I'm not talking about replacing OO here, but about complementing it.
JavaFX has shown that a declarative
solution goes beyond SQL and XSLT,
and can also be used for binding
properties and events between visual
components in a GUI
For fault tolerant and highly
concurrent systems, functional
programming is a very good fit,
as demonstrated by the Ericsson
AXD301 (programmed using Erlang)
So... as concurrency becomes more important and FP becomes more popular, I imagine that languages not supporting this paradigm will suffer. This includes many that are currently popular such as C++, Java and Ruby, though JavaScript should cope very nicely.
Using OOP makes the code easier to manage (as in modify/update/add new features) and understand. This is especially true with bigger projects. Because modules/objects encapsulate their data and operations on that data it is easier to comprehend the functionality and the big picture.
The benefit of OOP is that it is easier to discuss (with other developers/management/customer) a LogManager or OrderManager, each of which encompass specific functionality, then describing 'a group of methods that dump the data in file' and 'the methods that keep track of order details'.
So I guess OOP is helpful especially with big projects but there are always new concepts turning up so keep on lookout for new stuff in the future, evaluate and keep what is useful.
People like to think of various things as "objects" and classify them, so no doubt that OOP is so popular. However, there are some areas where OOP has not gained a bigger popularity. Most of the systems use relational databases rather than objective. Even if the second ones hold some notable records and are better for some types of tasks, the relational model is unsually chosen due to its popularity, availability of tools, support and the fact that the relational model is in fact a mathematical concept, contrary to OOP.
Another area where I have never seen OOP is the software building process. All the configuration and make scripts are procedural, partially because of the lack of the support for OOP in shell languages, partially because OOP is too complex for such tasks.
Slightly controversial opinion from me but I don't find OOP, at least of a kind that is popularly applied now, to be that helpful in producing the largest scale software in my particular domain (VFX, which is somewhat similar in scene organization and application state as games). I find it very useful on a medium to smaller scale. I have to be a bit careful here since I've invited some mobs in the past, but I should qualify that this is in my narrow experience in my particular type of domain.
The difficulty I've often found is that if you have all these small concrete objects encapsulating data, they now want to all talk to each other. The interactions between them can get extremely complex, like so (except much, much more complex in a real application spanning thousands of objects):
And this is not a dependency graph directly related to coupling so much as an "interaction graph". There could be abstractions to decouple these concrete objects from each other. Foo might not talk to Bar directly. It might instead talk to it through IBar or something of this sort. This graph would still connect Foo to Bar since, albeit being decoupled, they still talk to each other.
And all this communication between small and medium-sized objects which make up their own little ecosystem, if applied to the entire scale of a large codebase in my domain, can become extremely difficult to maintain. And it becomes so difficult to maintain because it's hard to reason about what happens with all these interactions between objects with respect to things like side effects.
Instead what I've found useful is to organize the overall codebase into completely independent, hefty subsystems that access a central "database". Each subsystem then inputs and outputs data. Some other subsystems might access the same data, but without any one system directly talking to each other.
... or this:
... and each individual system no longer attempts to encapsulate state. It doesn't try to become its own ecosystem. It instead reads and writes data in the central database.
Of course in the implementation of each subsystem, they might use a number of objects to help implement them. And that's where I find OOP very useful is in the implementation of these subsystems. But each of these subsystems constitutes a relatively medium to small-scale project, not too large, and it's at that medium to smaller scale that I find OOP very useful.
"Assembly-Line Programming" With Minimum Knowledge
This allows each subsystem to just focus on doing its thing with almost no knowledge of what's going on in the outside world. A developer focusing on physics can just sit down with the physics subsystem and know little about how the software works except that there's a central database from which he can retrieve things like motion components (just data) and transform them by applying physics to that data. And that makes his job very simple and makes it so he can do what he does best with the minimum knowledge of how everything else works. Input central data and output central data: that's all each subsystem has to do correctly for everything else to work. It's the closest thing I've found in my field to "assembly line programming" where each developer can do his thing with minimum knowledge about how the overall system works.
Testing is still also quite simple because of the narrow focus of each subsystem. We're no longer mocking concrete objects with dependency injection so much as generating a minimum amount of data relevant to a particular system and testing whether the particular system provides the correct output for a given input. With so few systems to test (just dozens can make up a complex software), it also reduces the number of tests required substantially.
Breaking Encapsulation
The system then turns into a rather flat pipeline transforming central application state through independent subsystems that are practically oblivious to each other's existence. One might sometimes push a central event to the database which another system processes, but that other system is still oblivious about where that event came from. I've found this is the key to tackling complexity at least in my domain, and it is effectively through an entity-component system.
Yet it resembles something closer to procedural or functional programming at the broad scale to decouple all these subsystems and let them work with minimal knowledge of the outside world since we're breaking encapsulation in order to achieve this and avoid requiring the systems to talk to each other. When you zoom in, then you might find your share of objects being used to implement any one of these subsystems, but at the broadest scale, the systems resembles something other than OOP.
Global Data
I have to admit that I was very hesitant about applying ECS at first to an architectural design in my domain since, first, it hadn't been done before to my knowledge in popular commercial competitors (3DS Max, SoftImage, etc), and second, it looks like a whole bunch of globally-accessible data.
I've found, however, that this is not a big problem. We can still very effectively maintain invariants, perhaps even better than before. The reason is due to the way the ECS organizes everything into systems and components. You can rest assured that an audio system won't try to mutate a motion component, e.g., not even under the hackiest of situations. Even with a poorly-coordinated team, it's very improbable that the ECS will degrade into something where you can no longer reason about which systems access which component, since it's rather obvious on paper and there are virtually no reasons whatsoever for a certain system to access an inappropriate component.
To the contrary it often removed many of the former temptations for hacky things with the data wide open since a lot of the hacky things done in our former codebase under loose coordination and crunch time was done in hasty attempts to x-ray abstractions and try to access the internals of the ecosystems of objects. The abstractions started to become leaky as a result of people, in a hurry, trying to just get and do things with the data they wanted to access. They were basically jumping through hoops trying to just access data which lead to interface designs degrading quickly.
There is something vaguely resembling encapsulation still just due to the way the system is organized since there's often only one system modifying a particular type of components (two in some exceptional cases). But they don't own that data, they don't provide functions to retrieve that data. The systems don't talk to each other. They all operate through the central ECS database (which is the only dependency that has to be injected into all these systems).
Flexibility and Extensibility
This is already widely-discussed in external resources about entity-component systems but they are extremely flexible at adapting to radically new design ideas
in hindsight, even concept-breaking ones like a suggestion for a creature which is a mammal, insect, and plant that sprouts leaves under sunlight all at once.
One of the reasons is because there are no central abstractions to break. You introduce some new components if you need more data for this or just create an entity which strings together the components required for a plant, mammal, and insect. The systems designed to process insect, mammal, and plant components then automatically pick it up and you might get the behavior you want without changing anything besides adding a line of code to instantiate an entity with a new combo of components. When you need whole new functionality, you just add a new system or modify an existing one.
What I haven't found discussed so much elsewhere is how much this eases maintenance even in scenarios when there are no concept-breaking design changes that we failed to anticipate. Even ignoring the flexibility of the ECS, it can really simplify things when your codebase reaches a certain scale.
Turning Objects Into Data
In a previous OOP-heavy codebase where I saw the difficulty of maintaining a codebase closer to the first graph above, the amount of code required exploded because the analogical Car in this diagram:
... had to be built as a completely separate subtype (class) implementing multiple interfaces. So we had an explosive number of objects in the system: a separate object for point lights from directional lights, a separate object for a fish eye camera from another, etc. We had thousands of objects implementing a few dozen abstract interfaces in endless combinations.
When I compared it to ECS, that required only hundreds and we were able to do the exact same things before using a small fraction of the code, because that turned the analogical Car entity into something that no longer requires its class. It turns into a simple collection of component data as a generalized instance of just one Entity type.
OOP Alternatives
So there are cases like this where OOP applied in excess at the broadest level of the design can start to really degrade maintainability. At the broadest birds-eye view of your system, it can help to flatten it and not try to model it so "deep" with objects interacting with objects interacting with objects, however abstractly.
Comparing the two systems I worked on in the past and now, the new one has more features but takes hundreds of thousands of LOC. The former required over 20 million LOC. Of course it's not the fairest comparison since the former one had a huge legacy, but if you take a slice of the two systems which are functionally quite equal without the legacy baggage (at least about as close to equal as we might get), the ECS takes a small fraction of the code to do the same thing, and partly because it dramatically reduces the number of classes there are in the system by turning them into collections (entities) of raw data (components) with hefty systems to process them instead of a boatload of small/medium objects.
Are there any scenarios where a truly non-OOP paradigm is actually a
better choice for a largescale solution? Or is that unheard of these
days?
It's far from unheard of. The system I'm describing above, for example, is widely used in games. It's quite rare in my field (most of the architectures in my field are COM-like with pure interfaces, and that's the type of architecture I worked on in the past), but I've found that peering over at what gamers are doing when designing an architecture made a world of difference in being able to create something that still remains very comprehensible at it grows and grows.
That said, some people consider ECS to be a type of object-oriented programming on its own. If so, it doesn't resemble OOP of a kind most of us would think of, since data (components and entities to compose them) and functionality (systems) are separated. It requires abandoning encapsulation at the broad system level which is often considered one of the most fundamental aspects of OOP.
High-Level Coding
But it seems to me that usually the pieces of higher level solutions
are almost always put together in a OOP fashion.
If you can piece together an application with very high-level code, then it tends to be rather small or medium in scale as far as the code your team has to maintain and can probably be assembled very effectively using OOP.
In my field in VFX, we often have to do things that are relatively low-level like raytracing, image processing, mesh processing, fluid dynamics, etc, and can't just piece these together from third party products since we're actually competing more in terms of what we can do at the low-level (users get more excited about cutting-edge, competitive production rendering improvements than, say, a nicer GUI). So there can be lots and lots of code ranging from very low-level shuffling of bits and bytes to very high-level code that scripters write through embedded scripting languages.
Interweb of Communication
But there comes a point with a large enough scale with any type of application, high-level or low-level or a combo, that revolves around a very complex central application state where I've found it no longer useful to try to encapsulate everything into objects. Doing so tends to multiply complexity and the difficulty to reason about what goes on due to the multiplied amount of interaction that goes on between everything. It no longer becomes so easy to reason about thousands of ecosystems talking to each other if there isn't a breaking point at a large enough scale where we stop modeling each thing as encapsulated ecosystems that have to talk to each other. Even if each one is individually simple, everything taken in as a whole can start to more than overwhelm the mind, and we often have to take a whole lot of that in to make changes and add new features and debug things and so forth if you try to revolve the design of an entire large-scale system solely around OOP principles. It can help to break free of encapsulation at some scale for at least some domains.
At that point it's not necessarily so useful anymore to, say, have a physics system encapsulate its own data (otherwise many things could want to talk to it and retrieve that data as well as initialize it with the appropriate input data), and that's where I found this alternative through ECS so helpful, since it turns the analogical physics system, and all such hefty systems, into a "central database transformer" or a "central database reader which outputs something new" which can now be oblivious about each other. Each system then starts to resemble more like a process in a flat pipeline than an object which forms a node in a very complex graph of communication.
When starting a new application what are some ways to decide what objects you will need, what they should do, and how they should interact with each other?
Is this a job for a white board, or is it easier to just start coding and move stuff around as needed?
CRC Cards, or Class-Responsibility-Collaboration Cards, are a good way to do this - see the Wikipedia page.
They're a good way to brainstorm (as a team or just by yourself) which classes you'll needed, what each class will be responsible for, and how they'll interact with each other.
You might want to try:
CRC Cards
The cards will help define the object, its responsibilities, and its collaborations.
There's an entire process that you go through when creating the CRC cards. That process defines the rules that help make each class concise and really only perform the operations it needs.
This should fall more or less 'naturally' from the requirements.
Do you have a solid understanding of what the app is supposed to do, its scope, et al?
It really depends on what the size of the project is you're talking about: the approach and rigor one must apply is different for a team building commercial software than a personal project.
That said, some general tips are:
1) On your medium of choice (I like whiteboards) start enumerating the use cases or user stories. Keep going until you feel like you've gotten the most important/encompassing 80% covered.
2) When you're satisfied that you have the "WHAT" (use cases) succinctly and more-or-less sufficiently defined, you can start working out the "HOW" (objects, algorithms, et al). I would suggest a bias towards less complexity: you do not want a complicated and multi-layered object hierarchy unless you really, really need it (and even then, you probably don't).
3) I tend to enforce a "no-coding" rule until #1 and #2 are done, throw-away prototypes or explorations of particular approaches notwithstanding. It's very very easy to start slinging code and discover you're "trapped" by the object model you came up with before you fully understood what it is you're building.
4) Once you're done with your requirements gathering you can use any # of approaches to start breaking out functional units/objects/roles/etc. CRC cards are one approach; I've also had success with UML class & sequence diagrams.
I personally like to do lots of whiteboarding in UML; I take pictures with a digital camera frequently to archive thinking/approaches. These are much better than nothing when it comes to poor-man's documentation or answering the question "why did/didn't we..." 2 months down the road.
I think the answer depends on a couple of factors:
what is the size of the project? For a small project it's fairly easy to have a good idea of the three or four objects that might be required and just start coding. For bigger projects it's important to start listing them beforehand so that you can develop good object names and hierarchy.
how many people are on the project? If there is more than just you, you should sit down and plan who is going to be working on what and that requires a list of the classes that are going to be needed.
For anything bigger than a small (one or two day) project, it's worth putting some time into design before you start coding. White boards are fine for this but make sure you take a picture when you're done!
Sequence and class diagrams. These go a long way when your in design. The last thing you want to do, unless you aren't constrained in terms of time & resources, is just start coding.
Probably best to start on a whiteboard or some design overview. It all depends on what your application needs to do really. Figure out what actions your application needs to do. Associate objects accordingly with the methods you come up after breaking your application down into appropriate pieces.
Everyone I work with is obsessed with the data-centric approach to enterprise development and hates the idea of using custom collections/objects. What is the best way to convince them otherwise?
Do it by example and tread lightly. Anything stronger will just alienate you from the rest of the team.
Remember to consider the possibility that they're onto something you've missed. Being part of a team means taking turns learning & teaching.
No single person has all the answers.
If you are working on legacy code (e.g., apps ported from .NET 1.x to 2.0 or 3.5) then it would be a bad idea to depart from datasets. Why change something that already works?
If you are, however, creating a new apps, there a few things that you can cite:
Appeal to experiencing pain in maintaining apps that stick with DataSets
Cite performance benefits for your new approach
Bait them with a good middle-ground. Move to .NET 3.5, and promote LINQ to SQL, for instance: while still sticking to data-driven architecture, is a huge, huge departure to string-indexed data sets, and enforces... voila! Custom collections -- in a manner that is hidden from them.
What is important is that whatever approach you use you remain consistent, and you are completely honest with the pros and cons of your approaches.
If all else fails (e.g., you have a development team that utterly refuses to budge from old practices and is skeptical of learning new things), this is a very, very clear sign that you've outgrown your team it's time to leave your company!
Remember to consider the possibility that they're onto something you've missed. Being part of a team means taking turns learning & teaching.
Seconded. The whole idea that "enterprise development" is somehow distinct from (and usually the implication is 'more important than') normal development really irks me.
If there really is a benefit for using some technology, then you'll need to come up with a considered list of all the pros and cons that would occur if you switched.
Present this list to your co workers along with explanations and examples for each one.
You have to be realistic when creating this list. You can't just say "Saves us lots of time!!! WIN!!" without addressing the fact that sometimes it is going to take MORE time, will require X months to come up to speed on the new tech, etc. You have to show concrete examples where it will save time, and exactly how.
Likewise you can't just skirt over the cons as if they don't matter, your co-workers will call you on it.
If you don't do these things, or come across as just pushing what you personally like, nobody is going to take you seriously, and you'll just get a reputation for being the guy who's full of enthusiasm and energy but has no idea about anything.
BTW. Look out for this particular con. It will trump everything, unless you have a lot of strong cases for all your other stuff:
Requires 12+ months work porting our existing code. You lose.
Of course, "it depends" on the situation. Sometimes DataSets or DataTables are more suited, like if it really is pretty light business logic, flat hierarchy of entities/records, or featuring some versioning capabilities.
Custom object collections shine when you want to implement a deep hierarchy/graph of objects that cannot be efficiently represented in flat 2D tables. What you can demonstrate is a large graph of objects and getting certain events to propagate down the correct branches without invoking inappropriate objects in other branches. That way it is not necessary to loop or Select through each and every DataTable just to get the child records.
For example, in a project I got involved in two and half years ago, there was a UI module that is supposed to display questions and answer controls in a single WinForms DataGrid (to be more specific, it was Infragistics' UltraGrid). Some more tricky requirements
The answer control for a question can be anything - text box, check box options, radio button options, drop-down lists, or even to pop up a custom dialog box that may pull more data from a web service.
Depending on what the user answered, it can trigger more sub-questions to appear directly under the parent question. If a different answer is given later, it should expose another set of sub-questions (if any) related to that answer.
The original implementation was written entirely in DataSets, DataTables, and arrays. The amount of looping through the hundreds of rows for multiple tables was purely mind-bending. It did not help the programmer came from a C++ background attempting to ref everything (hello, objects living in the heap use reference variables, like pointers!). Nobody, not even the originally programmer, could explain why the code is doing what it does. I came into the scene more than six months after this, and it was stil flooded with bugs. No wonder the 2nd-generation developer I took over from decided to quit.
Two months of tying to fix the chaotic mess, I took it upon myself to redesign the entire module into an object-oriented graph to solve this problem. yeap, complete with abstract classes (to render different answer control on a grid cell depending on question type), delegates and eventing. The end result was a 2D dataGrid binded to a deep hierarchy of questions, naturally sorted according to the parent-child arrangement. When a parent question's answer changed, it would raise an event to the children questions and they would automatically show/hide their rows in the grid according to the parent's answer. Only question objects down that path were affected. The UI responsiveness of this solution compared to the old method was by orders of magnitude.
Ironically, I wanted to post a question that was the exact opposite of this. Most of the programmers I've worked with have gone with the custom data objects/collections approach. It breaks my heart to watch someone with their SQL Server table definition open on one monitor, slowly typing up a matching row-wrapper class in Visual Studio in another monitor (complete with private properties and getters-setters for each column). It's especially painful if they're also prone to creating 60-column tables. I know there are ORM systems that can build these classes automagically, but I've seen the manual approach used much more frequently.
Engineering choices always involve trade-offs between the pros and cons of the available options. The DataSet-centric approach has its advantages (db-table-like in-memory representation of actual db data, classes written by people who know what they're doing, familiar to large pool of developers etc.), as do custom data objects (compile-type checking, users don't need to learn SQL etc.). If everyone else at your company is going the DataSet route, it's at least technically possible that DataSets are the best choice for what they're doing.
Datasets/tables aren't so bad are they?
Best advise I can give is to use it as much as you can in your own code, and hopefully through peer reviews and bugfixes, the other developers will see how code becomes more readable. (make sure to push the point when these occurrences happen).
Ultimately if the code works, then the rest is semantics is my view.
I guess you can trying selling the idea of O/R mapping and mapper tools. The benefit of treating rows as objects is pretty powerful.
I think you should focus on the performance. If you can create an application that shows the performance difference when using DataSets vs Custom Entities. Also, try to show them Domain Driven Design principles and how it fits with entity frameworks.
Don't make it a religion or faith discussion. Those are hard to win (and is not what you want anyway)
Don't frame it the way you just did in your question. The issue is not getting anyone to agree that this way or that way is the general way they should work. You should talk about how each one needs to think in order to make the right choice at any given time. give an example for when to use dataSet, and when not to.
I had developers using dataTables to store data they fetched from the database and then have business logic code using that dataTable... And I showed them how I reduced the time to load a page from taking 7 seconds of 100% CPU (on the web server) to not being able to see the CPU line move at all.. by changing the memory object from dataTable to Hash table.
So take an example or case that you thing is better implemented differently, and win that battle. Don't fight the a high level war...
If Interoperability is/will be a concern down the line, DataSet is definitely not the right direction to go in. You CAN expose DataSets/DataTables over a service but whether you SHOULD or is debatable. If you are talking .NET->.NET you're probably Ok, otherwise you are going to have a very unhappy client developer from the other side of the fence consuming your service
You can't convince them otherwise. Pick a smaller challenge or move to a different organization. If your manager respects you see if you can do a project in the domain-driven style as a sort of technology trial.
If you can profile, just Do it and profile. Datasets are heavier then a simple Collection<T>
DataReaders are faster then using Adapters...
Changing behavior in an objects is much easier than massaging a dataset
Anyway: Just Do It, ask for forgiveness not permission.
Most programmers don't like to stray out of their comfort zones (note that the intersection of the 'most programmers' set and the 'Stack Overflow' set is the probably the empty set). "If it worked before (or even just worked) then keep on doing it". The project I'm currently on required a lot of argument to get the older programmers to use XML/schemas/data sets instead of just CSV files (the previous version of the software used CSV's). It's not perfect, the schemas aren't robust enough at validating the data. But it's a step in the right direction. The code I develop uses OO abstractions on the data sets rather than passing data set objects around. Generally, it's best to teach by example, one small step at a time.
There is already some very good advice here but you'll still have a job to convince your colleagues if all you have to back you up is a few supportive comments on stackoverflow.
And, if they are as sceptical as they sound, you are going to need more ammo.
First, get a copy of Martin Fowler's "Patterns of Enterprise Architecture" which contains a detailed analysis of a variety of data access techniques.
Read it.
Then force them all to read it.
Job done.
data-centric means less code-complexity.
custom objects means potentially hundreds of additional objects to organize, maintain, and generally live with. It's also going to be a bit faster.
I think it's really a code-complexity vs performance question, which can be answered by the needs of your app.
Start small. Is there a utility app you can use to illustrate your point?
For instance, at a place where I worked, the main application had a complicated build process, involving changing config files, installing a service, etc.
So I wrote an app to automate the build process. It had a rudimentary WinForms UI. But since we were moving towards WPF, I changed it to a WPF UI, while keeping the WinForms UI as well, thanks to Model-View-Presenter. For those who weren't familiar with Model-View-Presenter, it was an easily-comprehensible example they could refer to.
Similarly, find something small where you can show them what a non-DataSet app would look like without having to make a major development investment.