I'm just wondering about large projects - say an airlines reservation system, how many classes/objects it is likely to have.
Objects: customer, airplane, airport, route, ticket, order. That's about all I can think of. The project will probably be hundreds of thousands of lines of code, so are there likely to be more classes (that do un-object related things)?
If so, how many classes are there likely to (roughly) be? How many objects (as per my list so not 10,000 customer objects, how many different names) are there likely to be?
There's really no magic formula for calculating optimal number of classes. The architecture you described above may create a very, very simple airline reservation system. As you continue to refactor, add more features, and accommodate special cases, you could end up with many more classes, e.g., MealPreference, CouponCode, Terminal, Gate, Airline, Baggage, BaggageTransfer, RainCheck, FlightUpgrade etc.
As you should (if you want to be agile), only code exactly what you need at the time, planning ahead for ease of extension. However, any project is going to grow in unanticipated ways over time.
For a real world airline reservation system? Thousands. Easily.
I'd guess around half of them are "infrastructure" classes - mostly related to persistence, logging, integration, etc. Maybe a few hundred domain classes (Airline, Airplane, Flight, Passenger, FrequentFlyer, MaintenanceSchedule, WeatherDelay, etc). And then another half of them would be UI related - controllers, views, view models, etc. to support both customer and internal apps.
The number of classes will be only statistical and it knowing it won't help you to establish any best practice, but I would say it can go to thousands of classes.
What it's important is to keep in mind the best practices and naming conventions, it is important to have a good package structure and name your classes according to its purpose, also keep in mind a high level of cohesion for your classes.
So other than satisfying your curiosity the number doesn't matter.
The number of classes is really relative to the design patterns and architecture you build.
For instance, with a simple console application that adds two numbers together, you can have anywhere from 1 to 10 (rough guess) based on the implementation and abstraction used.
You really can't judge how many "objects" an application is going to contain, without knowing the architecture and patterns.
I agree that it is very dependent on a lot of things; code language, application architecture, number of 3rd party extensions (which typically contain a lot of classes that don't actually ever get called by the app, but are included with .dll or .jar files), and a lot on the habit of the developer and if they tend to use giant monolithic classes, or break down everything into an interface, abstract, simple/stub, and real implementation.
However if your just looking for statistics, I was recently working on a large project for General Motors that had I'd say roughly 3000 classes, plus 200 JSP pages, and a few dozen CSS and JS files supporting the UI. Plugged into that you have DB drivers, Spring framework, a couple apache commons libraries, Hibernate ORM, Mule, JSTL, probably some other core components... the web server itself...
This is entirely dependent on the abstraction you use. In my experience, the closer you follow the problem domain (the closer you try to make your code read like what a business person would say), the more classes you'll have.
All the other answers you've received are wise: basically, "It depends."
But in another sense, there is an optimal number of classes you should have for a given number of methods; that is, given a number of methods, there is a specific number of classes (within a non-hierarchical encapsulation context, and to a close approximation) which minimises the potential structural complexity of those methods. (The actual number is given by the second law of encapsulation, and is equal to the square root of the number of methods divided by the number of public methods per class).
So then the question becomes: how many methods will you need? By analogy to the above, in which methods are encapsulated within classes, so blocks of code are encapsulation within methods, and so the number of methods is fixed by the same second law above.
So then the quesiton is: how many code blocks will I have?
That's answered by the other responses you've received: it depends.
For more, see Encapsulation Theory here:
www.EdmundKirwan.com
Regards,
Ed.
Related
When working on a complicated project, many people will involved in the developemnt, over a long time span. Therefore comes the problem of how to get everyone involved to understand the domain model.
When a project is first developed following DDD, it's probably well discussed among all people, and is designed carefully. At this stage it's relatively easy for everyone to understand and agree on the underlying domain model.
However as the project iterated over a longer period, different groups of people may be involved and few people could still hold the full picture. Even if the code is very well maintained, it's hard for non-programmers, including domain expert/ product managers/ testers, to grasp the business rules embeded in the code.
The only way out I could think of, is to keep the documents/umls/graphs well maintained for each changes, and always reflect the underlying model. However I think this is a huge challenge for any non-trival project. And it's very hard to decide how much details need to be included in the document.
Is there any best practice which I could learn from, such that the domain model could be well understood by people, and is also easy to evolve with the product?
Use Behaviour Driven Development (BDD).
BDD is like TDD. But BDD always focuses on testing domain behaviour, and tests are defined using the domain's Ubiquitous Language. All stories/features/scenarios can be written in a structured, human readable form (like this)
And because the tests are tightly coupled to your code, they have to stay in sync (assuming that your team is disciplined in keeping tests up-to-date).
In my experience, this offers the most economical solution for exposing up-to-date domain rules in a moderately readable format.
The most important strategy that comes from the DDD are the Bounded contexts. This corresponds to the (sub)domains in the business. Ideally they should map one to one. So, the first step that should be done is to clearly identify the (sub)domains; this is also the hardest.
Instead of keeping a lot of diagrams with a lot of details that are hard to keep up to date with the project, one should draw a context map.
After this you have already split the problem in pieces that are more manageable. Seeing a context map one could understand the big picture in a manner of minutes.
For each (sub)domain you can keep a list of main business rules. The Task management software (i.e. Jira) could be the single source of truth for these.
In the Vaughn Vernon's Domain-Driven Design Distilled book we can read that we should try to avoid creating technical abstractions that are perhaps too abstract and try to be more explicit by sticking to the concepts of the Ubiquitous Language.
Where I work we've built several tracking applications and in almost every of them there is the problem of having multiple specializations of the same thing, most likely with common behaviors, but different data and validation rules.
For instance, imagine an incident logging application where various kind of incidents are reported over the phone (e.g. car accident, fire, robbery). The information gathering process is similar to every incidents, but the captured data may vary widely as well as the validation rules that constrains this data.
So far, we have always solved these kind of problems with very technical abstractions (this is an oversimplified model, but you should get the idea):
As you can see, the DataValidationRules, DataFields and DataEntries abstractions have very little to do with the business of incident logging. Actually, they are part of a very generic solution to the problem of representing multiple entity specializations with different data in any domain.
I'd like to move away from this kind of very abstract model, but at the same time I do not see what would be the correct approach in making the business concepts explicit. I understand that the answer would be different in every domain, but in essence, should I be looking into having a single class per specialization? E.g. CarAccidentIndicent, FireIncident and RobberyIncident?
With a very limited number of specializations it seems like it could be manageable, but what if I have hundreds of them?
What about the UI? That means I'd have to move away from a generic way of generating the UI as well.
After thinking a little more about it I think I may have found a better and simpler way to express my concerns when it comes to DDD, OO and modeling many specializations.
On the one hand I want to come up with a model that is faithful to the Ubiquitous Language (UL) and model domain concepts explicitly. On the other hand I'm trying to respect the "favor composition over inheritance" mantra I'm so used to apply.
It seems that boths are conflicting because in order to enable composition I'll have to introduce abstractions that are most likely not part of the UL (e.g. Entity--Field composition) and when it comes to explicit modeling I do not see any other way than inheritance with one class per specialization.
Am I wrong in trying to avoid inheritance to represent hundreds of specialized entities that mainly differ in terms of data structure, not behaviors?
Then again, assuming they did differ a lot in terms of behaviors as well I'd have the same dilemma.
Just to be more explicit on the design choices:
In one scenario, composition would be achievable dynamically without requiring multiple classes per specialized compositions:
class Incident {
Set<Detail> details;
IncidentType type;
}
interface Detail {
public DetailType type();
}
class SomeDetail implements Detail {
...
}
class SomeOtherDetail implements Detail {
...
}
In the other scenario compositions are static and do require one class per specialized composition:
class CarAccidentIncident extends Incident {
SomeDetail someDetail;
SomeOtherDetail someOtherDetail;
}
class SomeDetail {}
class SomeOtherDetail {}
Obviously, the second approach is more explicit and offers a natural home for specific behaviors and rules. In the first approach we would have to introduce some abstract and technical concepts like Operation and DetailValidation which may not align well with the UL.
With a small number of different specializations I'd probably choose the latter without a second though, but because there are many of them it seems like I'm leaning more towards dynamic composition (even thought being dynamic is not required). Should I?
When to use DDD?
The thing is, DDD is not necessarily the right fit for all systems. It's particularly well suited to large systems with complex business rules.
If the business rules that need expressing to capture the essence of a FireIncident are simple enough to be encoded in a DataValidationRules record and a set of DataFields, then that suggests that perhaps those rules do not require the complexity of a DDD implementation.
The Domain of Data Validation
However, if you acknowledge that, you can shift your perspective towards intending to actually build a pure data validation engine. The domain of data validation should include entities such as data validation rules, and data fields, and would contemplate such questions related to the lifecycles of rules and fields - e.g. 'what happens if a validation rule changes - do all existing records that have previously been validated need revalidation?'
If the lifecycle of a data validation rule itself is complex enough to warrant it, then by all means, use DDD to implement that domain, although you may still choose to use CRUD if you find there are no complex rules or processes in the domain of data validation.
Who are your Domain Experts?
The further consequence of that is that your domain experts are no longer your end users (the people who know about car accidents and fire incidents) they are now the people (most likely specialists) who craft the validation rules and fields. If using DDD, you need to be asking them what types of rules they need and how they need the rules to work, and implementing using the Ubiquitous Language that they use to talk about the art and process of crafting validation rules.
Those people, in turn, would be 'programming' a next level system (you might say they are using a 4GL language tailored to the domain of incident logging) using your data validation engine. The thing is, their domain experts would be the people who know about car accidents. But the specialists wouldn't strictly be using DDD to craft the rules of a car accident, because they would not be expressing their model in software, but in the constrained language of your data capture and validation engine.
Additions following Update
Have been pondering this since your update and had a few more thoughts/questions:
Data Validation vs Entity Lifecycle/Behavior
Most of your concern is around representing data validation rules on create/update. Something that would help to understand is - what behavior/rules are represented by your entities other than data validation? i.e. in an incident management system, you might track an incident through a set of states such as Reported, WaitingForDispatch, ResponseEnRoute, ResponseOnSite, Resolved, Debriefed? In an insurance system you might track Reported, Verified, AwaitingFunding, Closed, etc.
The reason I ask, is that in the absence of such lifecycle behavior - if the main purpose of your system is pure data validation, then I return to my original thought of wondering if DDD is really the right approach for this system, as DDD brings greatest value when there is complex behavior to be modelled.
If you do have such lifecycles or other complex behavior - then one possibility is to consider the approach from the perspective of different bounded contexts - i.e. have one bounded context for data validation - which uses the approach you've described with more technical abstractions - as it is an efficient way to represent the validations - but another context from the perspective of lifecycle management, in which you could focus more on business abstractions - if all incidents follow similar set of lifecycles, then that context would have a much smaller number of specific entities.
Keeping entities sync'd between contexts is a whole 'nother topic, but not too troublesome if you adopt a service bus or event type technology and publish events when things change.
Updates to Validation Rules?
How do your business experts express requests for changes to validation rules? And how do you implement them? I'm guessing from what you've said, they probably express them in domain terms such as 'FireIncident'. But the implementation is interesting - do you have to craft data modification statements in SQL which get applied as part of a deployment?
Inheritance vs Composition
It seems that boths are conflicting because in order to enable composition I'll have to introduce abstractions that are most likely not part of the UL (e.g. Entity--Field composition)
I do not think this is true - composition does not have to require introducing technical abstractions. With either composition or inheritance, the goal is to distil insights into the domain to discover common patterns.
e.g. look for common behaviours or data validation sets and find the business language term that describes this commonality. e.g. You might find RobberyIncident and FireIncident both apply to Buildings.
If using inheritence you might create a BuildingIncident and RobberyIncident and FireIncident would extend BuildingIncident.
If using composition, you might create a valueobject to represent a Building and both RobberyIncident and FireIncident would contain a Building property. However RobberyIncident would also contain a Robbery property and FireIncident would also contain a Fire property. CarAccidentIncident and CarRobberyIncident would both contain a Car property, but CarRobberyIncident would also contain a Robbery property of the same type as the Robbery property on RobberyIncident - assuming they are truly common behaviours.
You may still have hundreds of classes representing specialised incident types, but they are simply composed of a set of value object properties representing the set of common patterns they are composed of - and those value objects can and should be in terms of ubiquitious language concepts.
My take on this is that not all information is pertinent to the domain.
I think that in many instances we try to apply techniques in an "all-or-nothing" approach whereas we may need to be focusing on the "right tool for the job". In the answer provided by Chris he asks the question "When to use DDD?" and mentions "The thing is, DDD is not necessarily the right fit for all systems." I would argue that DDD may not be appropriate for some parts of a system.
Would DDD be useful to create, say, a word processing application? I don't really think so. Although some good old proper OO would go a long way.
DDD is absolutely great for business behaviour focused bits of a system. However, there are going to be bits that can be modeled in a more technical/generic way that feed into more interesting business functionality. I'm sure that those incidents end up in some business process. An example may be a Claim. The business is very interested in tracking a claim and the claim amount, but where that claim came from isn't all too interesting. For all intents and purposes the "initiating documentation" may be filled in using pen and paper and scanned in to be linked to said claim. One could even start a new claim on the system using a plain text input.
I have been involved in a number of systems where a lot of peripheral data was sucked into the system but actually it wasn't really contributing much (law of diminishing returns and such).
I once worked on a loan system. The original 20 year-old system was re-written in C#. The main moving bits:
Client
Loan Amount
Payment schedule
Financial transactions (interest, payments, etc.)
All-in-all it is really a simple system. Well, 800+ tables later and stacks of developers/BAs and the system is somewhat of a monster. One could even capture stock and title deeds as guarantee. Now, my take would be to scan in some of this information and link it to the loan. However, somehow some business folks decide that they absolutely "must have" this information in the system. It isn't core though, I would say.
On the other end, another system I worked on calculated premiums. It was modeled quite business-like and was quite a maintenance nightmare. It was then re-written very generically by simply defining calculations that work on given inputs. There were some lookup tables for values and so on but no business processing as such.
Sometimes we may need to abstract moving bits into something that makes sense as an input or output and then use that in our domain. I think the UL should be used by ourselves and domain experts but it doesn't mean that we are not going to end up using technical concepts that are not part of the UL, and I think that that is okay. I'm sure a domain expert wouldn't care much for a SqlDbConnection even though we are going to using one of those in our code :) --- similarly we could model some structures outside of the domain proper.
In response to your update and question: I would not create a concrete class unless it really does feature in the UL in a big way. On a side note, I still favour composition over inheritance. I typically implement interfaces where necessary and go with abstract classes when inheriting, just to place some default behaviour when it helps.
The UL, as with any design, represents a model with nuances. We can apply DDD without using domain events. When we do use domain events we may even go with event sourcing. Event sourcing has very little to do with the UL in much the same way that the terms "Aggregate", "Entity", or "Value Object" would. The UL is going to be specific to the domain / domain experts and when we, as domain modelers, talk to each other we can describe various models in terms of DDD tactical patterns in order to bring across some of the specific UL concepts.
We have to listen to how a domain expert describes the problem space. As soon as we hear "When", as stated in so many other places, we know that we are probably dealing with an event. In much the same way we can listen to how a domain expert talks about the aggregates. For instance (totally bogus example):
"When a customer is registered we need to inform the supervisor of the CSR that initiated the request"
More loosely related to your example:
"When an incident takes place we need to capture some specific details regarding the incident. Depending on the type we need to capture different bits and validate that we have sufficient data to process our claim
Between these two we can see a distinct difference in how they are referring to interacting with the problem space. When a domain expert thinks of something in very broad terms I think it is prudent that we do the same.
On the other hand, should the conversation be more along the lines of this:
"When a car accident is registered we need to assign an assessor an wait for an assessment report that has to answer..."
Now we have something much more specific. These are, necessarily, mutually exclusive in that if they only ever talk about specifics then we go with "specific". If they first mention in broad terms and then specifics, we can also work in broad terms.
This is where our modeling is tricky to get right. It is the same nuance as we have in the Address as an aggregate vs value object "debate". It all depends on the context.
These things are going to be tricky and dependent on the domain in order to get right. As Eric Evans did mention: it may take a couple of models to get something that fits just right. This is necessarily so based on one's experience with the domain.
I've found myself reinventing a particular wheel way too often; and I was wondering if there was any sort of standardization that I could draw upon. Specifically, I find myself creating things like:
class Name(dict):
family, first, middle, prefixes, suffixes, titles, etc.
class Person(object):
name, dob, address, etc...
class Client(Person):
account_id, billing_address, intake_date, etc.
class Employee(Person):
tax_id, supervisor, roles, etc.
Now, obviously, the bits of relevant data to every use case are different; but, in the interest of consistency and interoperability, there have to be some standards relating to the abstraction of human persons in software.
I am seeking a generally applicable base abstraction so that I can write consistent, interoperable and readable "people" classes in c++, Python, Java or whatever. Any guidance?
--- EDIT ---
Though, as mentioned, the relevant "person" data tends to vary from one problem domain or use case to another, there are also consistencies between almost all domains that deal with abstractions of humans. Additionally, my experience is that what starts as a simple program addressing a small, well-defined problem domain often evolves into a much more complex application that needs to interoperate or exchange data with programs that deal with very different problem domains.
Since a huge amount of software needs to use an abstraction of "person" in one way or another, and since some of the core bits of data about humans can be expressed quite differently across languages and cultures; having a reference standard for person data could make life much easier as simple programs grow into complex ones.
The closest standard I can think of is the VCARD standard.
That being said different industries have standards for representing people that is generally far more useful than vCard. For example recruiting has HR-XML.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I recently had my code reviewed by a senior developer in my company. He criticized my design for using too many classes. I am interested to hear your reactions.
I was tasked to create a service which produces an xml file as a result of manipulating 3 other xml files. Let's name these aaa.xml, bbb.xml and ccc.xml. The service works in two phases. In phase one it scrubs aaa.xml against bbb.xml. In the second phase, it merges the product of phase one with ccc.xml to produce a final result.
I chose a design with three classes: an XmlService class which used two other classes, a scrubber class and a merger class. I kept the scrubbing and merging classes separate because the both classes were large and featured distinct logic.
I thought my approach was good because it kept my classes small and cohesive. My approach also helped to control the size of my test class.
The senior developer asserted that the scrubbing and merging classes would only be used by the XmlService class, and should therefore be part of it. He felt this would make the XMLService cohesive and this is what being cohesive means according to him. He also feels that breaking up classes this way makes them loose cohesiveness.
The irony is I tried to break these classes to achieve cohesiveness. What do you think? Who is right or wrong? Are we both right? Thank you for your suggestions.
If you follow the single responsibility principle (and based on the tone of your question, I think you do follow it), then the answer is clear:
A class is too big when it does more than one thing; and
A class is too small when it fails to fulfill its purpose.
That's very broad and indeed subjective -- hence the struggle with your colleague. On your side, you can argue:
There's absolutely no problem in creating additional classes -- It's a non-issue, compile-wise and runtime-wise.
The current implementation of the service may suggest that these classes "belong" to it, but that may change.
You can test each functionality separately.
You can apply dependency injection.
You ease the cognitive load of understanding the inner working of the service, because its code is smaller and better organized.
Furthermore, I think your boss has a misguided understanding of cohesion. Think of it as focus: the narrower the focus of your program, the higher the cohesion. If the code on your satellite classes is merged within the service class, the latter becomes less focused (less cohesive). It's generally accepted that higher cohesion is preferred over lower cohesion. Try to enlighten his/her view about it.
Cohesion is a measure of how strongly related is the functionality within a body of code. That said, if merging and scrubbing aren't strongly related, including them both in the same class reduces cohesion.
The Single Responsibility Principle is also worth mentioning here. Creating a class for the sole purpose of scrubbing and another for the sole purpose of merging follows this principle.
I'd say your approach is the better of the two.
What would you name the classes? ScrubXml and MergeXml are nice names. ProcessXML and ScrubAndMergeXml aren't, the first being too general and the second having a conjunction. As long as none of the classes rely at all on the internals of one or the others (i.e., low coupling), you've got a good design there.
Names are very useful in determining cohesion. A cohesive module does one thing, and therefore has a simple specific name. A module that does more than one thing is less cohesive, and needs a more general or more complicated name.
One problem with merging functionality in X into Y if X is only used by Y is the reductio ad absurdam: if your call graph is acyclic, you'll wind up with all your functionality in one class.
As someone who is coming back from the GodClass building fest of several years in duration, and now trying very hard to avoid that mistake in the future; the error of making a 6000 to 10000 line single source unit with a single class, with over 250 methods, 200 data fields, and some methods over 100 lines, the single responsibility principle looks like something worth defending against the predilections of your unenlightened supervisor.
If however, your supervisor and you are disagreeing over a matter of whether 2000 lines of code belong in one class or three, then I think you're both closer to sane, than I was. Maybe it's a matter of scale and perspective. Some object oriented programming aficionados like a certain "Coefficient of Standalone" per class, in direct violation of the generally understood ideas about how to improve cohesion and minimize coupling.
A set of 3 classes that work well together is, objectively, a more object-oriented system, than a single class that does the same thing. one could argue that if you write an application using only one class, that the application is not really object oriented at all.
If the scrubber and merger are not meaningful outside the context of the main class, then I agree with your reviewer, particularly if you've had to expose any implementation details in order to allow this separation. If you're using a language supporting nested private classes or something similar, that might be a reasonable alternative to maintain the logical separation without exposing implementation details to outside consumers of the main class.
This is a very subjective area, and will be different depending on coding and style guidelines, and who approves your code.
That said, if your defense of your design didn't hold up, and your senior team member still insisted on merging your classes, then you have to compromise:
You've already got the logic separated out. Write the one service class and keep the methods separate like other good design, and then write a glue method. Add some comments above each method explaining how they could easily be partitioned to multiple classes if the need arises in the future.
From what I understand, OOP is the most commonly used paradigm for large scale projects. I also know that some smaller subsets of big systems use other paradigms (e.g. SQL, which is declarative), and I also realize that at lower levels of computing OOP isn't really feasible. But it seems to me that usually the pieces of higher level solutions are almost always put together in a OOP fashion.
Are there any scenarios where a truly non-OOP paradigm is actually a better choice for a largescale solution? Or is that unheard of these days?
I've wondered this ever since I've started studying CS; it's easy to get the feeling that OOP is some nirvana of programming that will never be surpassed.
In my opinion, the reason OOP is used so widely isn't so much that it's the right tool for the job. I think it's more that a solution can be described to the customer in a way that they understand.
A CAR is a VEHICLE that has an ENGINE. That's programming and real world all in one!
It's hard to comprehend anything that can fit the programming and real world quite so elegantly.
Linux is a large-scale project that's very much not OOP. And it wouldn't have a lot to gain from it either.
I think OOP has a good ring to it, because it has associated itself with good programming practices like encapsulation, data hiding, code reuse, modularity et.c. But these virtues are by no means unique to OOP.
You might have a look at Erlang, written by Joe Armstrong.
Wikipedia:
"Erlang is a general-purpose
concurrent programming language and
runtime system. The sequential subset
of Erlang is a functional language,
with strict evaluation, single
assignment, and dynamic typing."
Joe Armstrong:
“Because the problem with
object-oriented languages is they’ve
got all this implicit environment that
they carry around with them. You
wanted a banana but what you got was a
gorilla holding the banana and the
entire jungle.”
The promise of OOP was code reuse and easier maintenance. I am not sure it delivered. I see things such as dot net as being much the same as the C libraries we used to get fro various vendors. You can call that code reuse if you want. As for maintenance bad code is bad code. OOP did not help.
I'm the biggest fan of OOP, and I practice OOP every day.
It's the most natural way to write code, because it resembles the real life.
Though, I realize that the OOP's virtualization might cause performance issues.
Of course that depends on your design, the language and the platform you chose (systems written in Garbage collection based languages such as Java or C# might perform worse than systems which were written in C++ for example).
I guess in Real-time systems, procedural programming may be more appropriate.
Note that not all projects that claim to be OOP are in fact OOP. Sometimes the majority of the code is procedural, or the data model is anemic, and so on...
Zyx, you wrote, "Most of the systems use relational databases ..."
I'm afraid there's no such thing. The relational model will be 40 years old next year and has still never been implemented. I think you mean, "SQL databases." You should read anything by Fabian Pascal to understand the difference between a relational dbms and an SQL dbms.
" ... the relational model is usually chosen due to its popularity,"
True, it's popular.
" ... availability of tools,"
Alas without the main tool necessary: an implementation of the relational model.
" support,"
Yup, the relational model has fine support, I'm sure, but it's entirely unsupported by a dbms implementation.
" and the fact that the relational model is in fact a mathematical concept,"
Yes, it's a mathematical concept, but, not being implemented, it's largely restricted to the ivory towers. String theory is also a mathematical concept but I wouldn't implement a system with it.
In fact, despite it's being a methematical concept, it is certainly not a science (as in computer science) because it lacks the first requirement of any science: that it is falsifiable: there's no implementation of a relational dbms against which we can check its claims.
It's pure snake oil.
" ... contrary to OOP."
And contrary to OOP, the relational model has never been implemented.
Buy a book on SQL and get productive.
Leave the relational model to unproductive theorists.
See this and this. Apparently you can use C# with five different programming paradigms, C++ with three, etc.
Software construction is not akin to Fundamental Physics. Physics strive to describe reality using paradigms which may be challenged by new experimental data and/or theories. Physics is a science which searches for a "truth", in a way that Software construction doesn't.
Software construction is a business. You need to be productive, i.e. to achieve some goals for which someone will pay money. Paradigms are used because they are useful to produce software effectively. You don't need everyone to agree. If I do OOP and it's working well for me, I don't care if a "new" paradigm would potentially be 20% more useful to me if I had the time and money to learn it and later rethink the whole software structure I'm working in and redesign it from scratch.
Also, you may be using another paradigm and I'll still be happy, in the same way that I can make money running a Japanese food restaurant and you can make money with a Mexican food restaurant next door. I don't need to discuss with you whether Japanese food is better than Mexican food.
I doubt OOP is going away any time soon, it just fits our problems and mental models far too well.
What we're starting to see though is multi-paradigm approaches, with declarative and functional ideas being incorporated into object oriented designs. Most of the newer JVM languages are a good example of this (JavaFX, Scala, Clojure, etc.) as well as LINQ and F# on the .net platform.
It's important to note that I'm not talking about replacing OO here, but about complementing it.
JavaFX has shown that a declarative
solution goes beyond SQL and XSLT,
and can also be used for binding
properties and events between visual
components in a GUI
For fault tolerant and highly
concurrent systems, functional
programming is a very good fit,
as demonstrated by the Ericsson
AXD301 (programmed using Erlang)
So... as concurrency becomes more important and FP becomes more popular, I imagine that languages not supporting this paradigm will suffer. This includes many that are currently popular such as C++, Java and Ruby, though JavaScript should cope very nicely.
Using OOP makes the code easier to manage (as in modify/update/add new features) and understand. This is especially true with bigger projects. Because modules/objects encapsulate their data and operations on that data it is easier to comprehend the functionality and the big picture.
The benefit of OOP is that it is easier to discuss (with other developers/management/customer) a LogManager or OrderManager, each of which encompass specific functionality, then describing 'a group of methods that dump the data in file' and 'the methods that keep track of order details'.
So I guess OOP is helpful especially with big projects but there are always new concepts turning up so keep on lookout for new stuff in the future, evaluate and keep what is useful.
People like to think of various things as "objects" and classify them, so no doubt that OOP is so popular. However, there are some areas where OOP has not gained a bigger popularity. Most of the systems use relational databases rather than objective. Even if the second ones hold some notable records and are better for some types of tasks, the relational model is unsually chosen due to its popularity, availability of tools, support and the fact that the relational model is in fact a mathematical concept, contrary to OOP.
Another area where I have never seen OOP is the software building process. All the configuration and make scripts are procedural, partially because of the lack of the support for OOP in shell languages, partially because OOP is too complex for such tasks.
Slightly controversial opinion from me but I don't find OOP, at least of a kind that is popularly applied now, to be that helpful in producing the largest scale software in my particular domain (VFX, which is somewhat similar in scene organization and application state as games). I find it very useful on a medium to smaller scale. I have to be a bit careful here since I've invited some mobs in the past, but I should qualify that this is in my narrow experience in my particular type of domain.
The difficulty I've often found is that if you have all these small concrete objects encapsulating data, they now want to all talk to each other. The interactions between them can get extremely complex, like so (except much, much more complex in a real application spanning thousands of objects):
And this is not a dependency graph directly related to coupling so much as an "interaction graph". There could be abstractions to decouple these concrete objects from each other. Foo might not talk to Bar directly. It might instead talk to it through IBar or something of this sort. This graph would still connect Foo to Bar since, albeit being decoupled, they still talk to each other.
And all this communication between small and medium-sized objects which make up their own little ecosystem, if applied to the entire scale of a large codebase in my domain, can become extremely difficult to maintain. And it becomes so difficult to maintain because it's hard to reason about what happens with all these interactions between objects with respect to things like side effects.
Instead what I've found useful is to organize the overall codebase into completely independent, hefty subsystems that access a central "database". Each subsystem then inputs and outputs data. Some other subsystems might access the same data, but without any one system directly talking to each other.
... or this:
... and each individual system no longer attempts to encapsulate state. It doesn't try to become its own ecosystem. It instead reads and writes data in the central database.
Of course in the implementation of each subsystem, they might use a number of objects to help implement them. And that's where I find OOP very useful is in the implementation of these subsystems. But each of these subsystems constitutes a relatively medium to small-scale project, not too large, and it's at that medium to smaller scale that I find OOP very useful.
"Assembly-Line Programming" With Minimum Knowledge
This allows each subsystem to just focus on doing its thing with almost no knowledge of what's going on in the outside world. A developer focusing on physics can just sit down with the physics subsystem and know little about how the software works except that there's a central database from which he can retrieve things like motion components (just data) and transform them by applying physics to that data. And that makes his job very simple and makes it so he can do what he does best with the minimum knowledge of how everything else works. Input central data and output central data: that's all each subsystem has to do correctly for everything else to work. It's the closest thing I've found in my field to "assembly line programming" where each developer can do his thing with minimum knowledge about how the overall system works.
Testing is still also quite simple because of the narrow focus of each subsystem. We're no longer mocking concrete objects with dependency injection so much as generating a minimum amount of data relevant to a particular system and testing whether the particular system provides the correct output for a given input. With so few systems to test (just dozens can make up a complex software), it also reduces the number of tests required substantially.
Breaking Encapsulation
The system then turns into a rather flat pipeline transforming central application state through independent subsystems that are practically oblivious to each other's existence. One might sometimes push a central event to the database which another system processes, but that other system is still oblivious about where that event came from. I've found this is the key to tackling complexity at least in my domain, and it is effectively through an entity-component system.
Yet it resembles something closer to procedural or functional programming at the broad scale to decouple all these subsystems and let them work with minimal knowledge of the outside world since we're breaking encapsulation in order to achieve this and avoid requiring the systems to talk to each other. When you zoom in, then you might find your share of objects being used to implement any one of these subsystems, but at the broadest scale, the systems resembles something other than OOP.
Global Data
I have to admit that I was very hesitant about applying ECS at first to an architectural design in my domain since, first, it hadn't been done before to my knowledge in popular commercial competitors (3DS Max, SoftImage, etc), and second, it looks like a whole bunch of globally-accessible data.
I've found, however, that this is not a big problem. We can still very effectively maintain invariants, perhaps even better than before. The reason is due to the way the ECS organizes everything into systems and components. You can rest assured that an audio system won't try to mutate a motion component, e.g., not even under the hackiest of situations. Even with a poorly-coordinated team, it's very improbable that the ECS will degrade into something where you can no longer reason about which systems access which component, since it's rather obvious on paper and there are virtually no reasons whatsoever for a certain system to access an inappropriate component.
To the contrary it often removed many of the former temptations for hacky things with the data wide open since a lot of the hacky things done in our former codebase under loose coordination and crunch time was done in hasty attempts to x-ray abstractions and try to access the internals of the ecosystems of objects. The abstractions started to become leaky as a result of people, in a hurry, trying to just get and do things with the data they wanted to access. They were basically jumping through hoops trying to just access data which lead to interface designs degrading quickly.
There is something vaguely resembling encapsulation still just due to the way the system is organized since there's often only one system modifying a particular type of components (two in some exceptional cases). But they don't own that data, they don't provide functions to retrieve that data. The systems don't talk to each other. They all operate through the central ECS database (which is the only dependency that has to be injected into all these systems).
Flexibility and Extensibility
This is already widely-discussed in external resources about entity-component systems but they are extremely flexible at adapting to radically new design ideas
in hindsight, even concept-breaking ones like a suggestion for a creature which is a mammal, insect, and plant that sprouts leaves under sunlight all at once.
One of the reasons is because there are no central abstractions to break. You introduce some new components if you need more data for this or just create an entity which strings together the components required for a plant, mammal, and insect. The systems designed to process insect, mammal, and plant components then automatically pick it up and you might get the behavior you want without changing anything besides adding a line of code to instantiate an entity with a new combo of components. When you need whole new functionality, you just add a new system or modify an existing one.
What I haven't found discussed so much elsewhere is how much this eases maintenance even in scenarios when there are no concept-breaking design changes that we failed to anticipate. Even ignoring the flexibility of the ECS, it can really simplify things when your codebase reaches a certain scale.
Turning Objects Into Data
In a previous OOP-heavy codebase where I saw the difficulty of maintaining a codebase closer to the first graph above, the amount of code required exploded because the analogical Car in this diagram:
... had to be built as a completely separate subtype (class) implementing multiple interfaces. So we had an explosive number of objects in the system: a separate object for point lights from directional lights, a separate object for a fish eye camera from another, etc. We had thousands of objects implementing a few dozen abstract interfaces in endless combinations.
When I compared it to ECS, that required only hundreds and we were able to do the exact same things before using a small fraction of the code, because that turned the analogical Car entity into something that no longer requires its class. It turns into a simple collection of component data as a generalized instance of just one Entity type.
OOP Alternatives
So there are cases like this where OOP applied in excess at the broadest level of the design can start to really degrade maintainability. At the broadest birds-eye view of your system, it can help to flatten it and not try to model it so "deep" with objects interacting with objects interacting with objects, however abstractly.
Comparing the two systems I worked on in the past and now, the new one has more features but takes hundreds of thousands of LOC. The former required over 20 million LOC. Of course it's not the fairest comparison since the former one had a huge legacy, but if you take a slice of the two systems which are functionally quite equal without the legacy baggage (at least about as close to equal as we might get), the ECS takes a small fraction of the code to do the same thing, and partly because it dramatically reduces the number of classes there are in the system by turning them into collections (entities) of raw data (components) with hefty systems to process them instead of a boatload of small/medium objects.
Are there any scenarios where a truly non-OOP paradigm is actually a
better choice for a largescale solution? Or is that unheard of these
days?
It's far from unheard of. The system I'm describing above, for example, is widely used in games. It's quite rare in my field (most of the architectures in my field are COM-like with pure interfaces, and that's the type of architecture I worked on in the past), but I've found that peering over at what gamers are doing when designing an architecture made a world of difference in being able to create something that still remains very comprehensible at it grows and grows.
That said, some people consider ECS to be a type of object-oriented programming on its own. If so, it doesn't resemble OOP of a kind most of us would think of, since data (components and entities to compose them) and functionality (systems) are separated. It requires abandoning encapsulation at the broad system level which is often considered one of the most fundamental aspects of OOP.
High-Level Coding
But it seems to me that usually the pieces of higher level solutions
are almost always put together in a OOP fashion.
If you can piece together an application with very high-level code, then it tends to be rather small or medium in scale as far as the code your team has to maintain and can probably be assembled very effectively using OOP.
In my field in VFX, we often have to do things that are relatively low-level like raytracing, image processing, mesh processing, fluid dynamics, etc, and can't just piece these together from third party products since we're actually competing more in terms of what we can do at the low-level (users get more excited about cutting-edge, competitive production rendering improvements than, say, a nicer GUI). So there can be lots and lots of code ranging from very low-level shuffling of bits and bytes to very high-level code that scripters write through embedded scripting languages.
Interweb of Communication
But there comes a point with a large enough scale with any type of application, high-level or low-level or a combo, that revolves around a very complex central application state where I've found it no longer useful to try to encapsulate everything into objects. Doing so tends to multiply complexity and the difficulty to reason about what goes on due to the multiplied amount of interaction that goes on between everything. It no longer becomes so easy to reason about thousands of ecosystems talking to each other if there isn't a breaking point at a large enough scale where we stop modeling each thing as encapsulated ecosystems that have to talk to each other. Even if each one is individually simple, everything taken in as a whole can start to more than overwhelm the mind, and we often have to take a whole lot of that in to make changes and add new features and debug things and so forth if you try to revolve the design of an entire large-scale system solely around OOP principles. It can help to break free of encapsulation at some scale for at least some domains.
At that point it's not necessarily so useful anymore to, say, have a physics system encapsulate its own data (otherwise many things could want to talk to it and retrieve that data as well as initialize it with the appropriate input data), and that's where I found this alternative through ECS so helpful, since it turns the analogical physics system, and all such hefty systems, into a "central database transformer" or a "central database reader which outputs something new" which can now be oblivious about each other. Each system then starts to resemble more like a process in a flat pipeline than an object which forms a node in a very complex graph of communication.