Service-Orientation vs Object-Orientation - can they coexist? - oop

There's been a lot of interest in Service-Oriented Architecture (SOA) at my company recently. Whenever I try to see how we might use it, I always run up against a mental block. Crudely:
Object-orientation says: "keep data and methods that manipulate data (business processes) together";
Service-orientation says: "keep the business process in the service, and pass data to it".
Previous attempts to develop SOA have ended up converting object-oriented code into data structures and separate procedures (services) that manipulate them, which seems like a step backwards.
My question is: what patterns, architectures, strategies etc. allow SOA and OO to work together?
Edit: The answers saying "OO for internals, SOA for system boundaries" are great and useful, but this isn't quite what I was getting at.
Let's say you have an Account object which has a business operation called Merge that combines it with another account. A typical OO approach would look like this:
Account mainAccount = database.loadAccount(mainId);
Account lesserAccount = database.loadAccount(lesserId);
mainAccount.mergeWith(lesserAccount);
mainAccount.save();
lesserAccount.delete();
Whereas the SOA equivalent I've seen looks like this:
Account mainAccount = accountService.loadAccount(mainId);
Account lesserAccount = accountService.loadAccount(lesserId);
accountService.merge(mainAccount, lesserAccount);
// save and delete handled by the service
In the OO case the business logic (and entity awareness thanks to an ActiveRecord pattern) are baked into the Account class. In the SOA case the Account object is really just a structure, since all of the business rules are buried in the service.
Can I have rich, functional classes and reusable services at the same time?

My opinion is that SOA can be useful, at a macro level, but each service probably still will be large enough to need several internal components. The internal components may benefit from OO architecture.
The SOA API should be defined more carefully than the internal APIs, since it is an external API. The data types passed at this level should be as simple as possible, with no internal logic. If there is some logic that belongs with the data type (e.g. validation), there should preferably be one service in charge of running the logic on the data type.

SOA is a good architecture for communicating between systems or applications.
Each application defines a "service" interface which consists of the requests it will handle and the responses expected.
The key points here are well defined services, with a well defined interface. How your services are actually implemented is irrelevant as far as SOA is concerned.
So you are free to implement your services using all the latest and greatest OO techniques, or any other methodology that works for you. ( I have seen extreme cases where the "service" is implemented by actual humans entering data on a screen -- yet everything was still text book SOA!).

I really think SOA is only useful for external interfaces (generally speaking, to those outside your company), and even then, only in cases when performance doesn't really matter, you don't need ordered delivery of messages.
In light of that, I think they can coexist. Keep your applications working and communicating using the OO philosophy, and only when external interfaces (to third parties) are needed, expose them via SOA (this is not essential, but it is one way).
I really feel SOA is overused, or at least architectures with SOA are getting proposed too often. I don't really know of any big systems that use SOA internally, and I doubt they could. It seems like more of a thing you might just use to do mashups or simple weather forecast type requests, not build serious systems on top of.

I think that this is a misunderstanding of object orientation. Even in Java, the methods are generally not part of the objects but of their class (and even this "membership" is not necessary for object orientation, but that is a different subject). A class is just a description of a type, so this is really a part of the program, not the data.
SOA and OO do not contradict each other. A service can accept data, organize them into objects internally, work on them, and finally give them back in whatever format is desired.

I've heard James Gosling say that one could implement SOA in COBOL.
If you read Alan Kay's own description of the origins of OOP, it describes a bunch of little computers interacting to perform something useful.
Consider this description:
Your X should be made up of Ys. Each Y should be responsible for a single concept, and should be describable completely in terms of its interface. One Y can ask another Y to do something via an exchange of messages (per their specified interfaces).
In some cases, an X may be implemented by a Z, which it manages according to its interface. No X is allowed direct access to another X's Z.
I think that the following substitutions are possible:
Term Programing Architecture
---- --------------- ------------
X Program System
Y Objects Services
Z Data structure Database
---- --------------- ------------
result OOP SOA
If you think primarily in terms of encapsulation, information hiding, loose coupling, and black-box interfaces, there is quite a bit of similarity. If you get bogged down in polymorphism, inheritance, etc. you're thinking programming / implementation instead of architecture, IMHO.

If you allow your services to remember state, then they can just be considered to be big objects with a possibly slow invocation time.
If they are not allowed to retain state then they are just as you've said - operators on data.
It sounds like you may be dividing your system up into too many services. Do you have written, mutually agreed criteria for how to divide?
Adopting SOA does not mean throw out all your objects but is about dividing your system into large reusable chunks.

Related

Differences between Monolithic Architecture and SOA

I confused these two architecture's differences.
Generally, Monolithic Architecture refers to different components of an application being combined into a single business logic.
And, Service Oriented Architecture refers to an application composed of discrete and loosely coupled software agents that perform a required function.
But I learned, OOP and design-pattern aim "One function should have One responsibility" and "Decrease dependencies to increase reuse of function".
So my question is :
1. Do monolithic architecture follow OOP?
2. Differences between these two architecture.
1. Do monolithic architecture follow OOP?
Good ones do. Yes. It would be even more of a spaghetti code mess if it was procedural or much harder to write if it was functional. In all the pro companies I've worked for they all had OOP and frameworks like MVP they also incorporate. In my experience supporting companies with problem products, they tended to be a mess. Really boated, little reuse, if one 1MB thing needed to scale the only way to do it was scale the entire 100MB solution.
2. Differences between these two architecture.
Service Orientated Architecture IMO is taking the view each function can be an API, its service orientated. Object Orientated Programming has a few tenants, Polymorphic, Reusable, Encapsulated (and Abstract), it's a style of programming, almost opposite to procedural programming etc.
When you design a service, let's say it's a Weather app that tells temp and also does predictions. You will want to have an abstract generic Weather class, make it support Celsius and Fahrenheit implementations and you can get some polymorphism, or perhaps add an Interface to support different locations or altitude's, and use really generic function names that sound abstract like GetTemp (with function overloads by City, State, Altitude) to encapsulate it in the Weather class.
So here you design it with the Service Orientation in mind and a solid OOP assembly.
"One function should have One responsibility" and "Decrease dependencies to increase reuse of function".
For the second service to predict temp's that will make some calculations and take time to compute so we will want to isolate it, so we can scale it independently. You can do this by using a different project and instead of referencing each other you put in Messaging technology (SQS, RabbitMQ, etc) so they are loosely coupled.
Note: We can simply cache the static GetTemp results in the other project to easily scale that part of the system.

Abstractions that are... too abstract?

In the Vaughn Vernon's Domain-Driven Design Distilled book we can read that we should try to avoid creating technical abstractions that are perhaps too abstract and try to be more explicit by sticking to the concepts of the Ubiquitous Language.
Where I work we've built several tracking applications and in almost every of them there is the problem of having multiple specializations of the same thing, most likely with common behaviors, but different data and validation rules.
For instance, imagine an incident logging application where various kind of incidents are reported over the phone (e.g. car accident, fire, robbery). The information gathering process is similar to every incidents, but the captured data may vary widely as well as the validation rules that constrains this data.
So far, we have always solved these kind of problems with very technical abstractions (this is an oversimplified model, but you should get the idea):
As you can see, the DataValidationRules, DataFields and DataEntries abstractions have very little to do with the business of incident logging. Actually, they are part of a very generic solution to the problem of representing multiple entity specializations with different data in any domain.
I'd like to move away from this kind of very abstract model, but at the same time I do not see what would be the correct approach in making the business concepts explicit. I understand that the answer would be different in every domain, but in essence, should I be looking into having a single class per specialization? E.g. CarAccidentIndicent, FireIncident and RobberyIncident?
With a very limited number of specializations it seems like it could be manageable, but what if I have hundreds of them?
What about the UI? That means I'd have to move away from a generic way of generating the UI as well.
After thinking a little more about it I think I may have found a better and simpler way to express my concerns when it comes to DDD, OO and modeling many specializations.
On the one hand I want to come up with a model that is faithful to the Ubiquitous Language (UL) and model domain concepts explicitly. On the other hand I'm trying to respect the "favor composition over inheritance" mantra I'm so used to apply.
It seems that boths are conflicting because in order to enable composition I'll have to introduce abstractions that are most likely not part of the UL (e.g. Entity--Field composition) and when it comes to explicit modeling I do not see any other way than inheritance with one class per specialization.
Am I wrong in trying to avoid inheritance to represent hundreds of specialized entities that mainly differ in terms of data structure, not behaviors?
Then again, assuming they did differ a lot in terms of behaviors as well I'd have the same dilemma.
Just to be more explicit on the design choices:
In one scenario, composition would be achievable dynamically without requiring multiple classes per specialized compositions:
class Incident {
Set<Detail> details;
IncidentType type;
}
interface Detail {
public DetailType type();
}
class SomeDetail implements Detail {
...
}
class SomeOtherDetail implements Detail {
...
}
In the other scenario compositions are static and do require one class per specialized composition:
class CarAccidentIncident extends Incident {
SomeDetail someDetail;
SomeOtherDetail someOtherDetail;
}
class SomeDetail {}
class SomeOtherDetail {}
Obviously, the second approach is more explicit and offers a natural home for specific behaviors and rules. In the first approach we would have to introduce some abstract and technical concepts like Operation and DetailValidation which may not align well with the UL.
With a small number of different specializations I'd probably choose the latter without a second though, but because there are many of them it seems like I'm leaning more towards dynamic composition (even thought being dynamic is not required). Should I?
When to use DDD?
The thing is, DDD is not necessarily the right fit for all systems. It's particularly well suited to large systems with complex business rules.
If the business rules that need expressing to capture the essence of a FireIncident are simple enough to be encoded in a DataValidationRules record and a set of DataFields, then that suggests that perhaps those rules do not require the complexity of a DDD implementation.
The Domain of Data Validation
However, if you acknowledge that, you can shift your perspective towards intending to actually build a pure data validation engine. The domain of data validation should include entities such as data validation rules, and data fields, and would contemplate such questions related to the lifecycles of rules and fields - e.g. 'what happens if a validation rule changes - do all existing records that have previously been validated need revalidation?'
If the lifecycle of a data validation rule itself is complex enough to warrant it, then by all means, use DDD to implement that domain, although you may still choose to use CRUD if you find there are no complex rules or processes in the domain of data validation.
Who are your Domain Experts?
The further consequence of that is that your domain experts are no longer your end users (the people who know about car accidents and fire incidents) they are now the people (most likely specialists) who craft the validation rules and fields. If using DDD, you need to be asking them what types of rules they need and how they need the rules to work, and implementing using the Ubiquitous Language that they use to talk about the art and process of crafting validation rules.
Those people, in turn, would be 'programming' a next level system (you might say they are using a 4GL language tailored to the domain of incident logging) using your data validation engine. The thing is, their domain experts would be the people who know about car accidents. But the specialists wouldn't strictly be using DDD to craft the rules of a car accident, because they would not be expressing their model in software, but in the constrained language of your data capture and validation engine.
Additions following Update
Have been pondering this since your update and had a few more thoughts/questions:
Data Validation vs Entity Lifecycle/Behavior
Most of your concern is around representing data validation rules on create/update. Something that would help to understand is - what behavior/rules are represented by your entities other than data validation? i.e. in an incident management system, you might track an incident through a set of states such as Reported, WaitingForDispatch, ResponseEnRoute, ResponseOnSite, Resolved, Debriefed? In an insurance system you might track Reported, Verified, AwaitingFunding, Closed, etc.
The reason I ask, is that in the absence of such lifecycle behavior - if the main purpose of your system is pure data validation, then I return to my original thought of wondering if DDD is really the right approach for this system, as DDD brings greatest value when there is complex behavior to be modelled.
If you do have such lifecycles or other complex behavior - then one possibility is to consider the approach from the perspective of different bounded contexts - i.e. have one bounded context for data validation - which uses the approach you've described with more technical abstractions - as it is an efficient way to represent the validations - but another context from the perspective of lifecycle management, in which you could focus more on business abstractions - if all incidents follow similar set of lifecycles, then that context would have a much smaller number of specific entities.
Keeping entities sync'd between contexts is a whole 'nother topic, but not too troublesome if you adopt a service bus or event type technology and publish events when things change.
Updates to Validation Rules?
How do your business experts express requests for changes to validation rules? And how do you implement them? I'm guessing from what you've said, they probably express them in domain terms such as 'FireIncident'. But the implementation is interesting - do you have to craft data modification statements in SQL which get applied as part of a deployment?
Inheritance vs Composition
It seems that boths are conflicting because in order to enable composition I'll have to introduce abstractions that are most likely not part of the UL (e.g. Entity--Field composition)
I do not think this is true - composition does not have to require introducing technical abstractions. With either composition or inheritance, the goal is to distil insights into the domain to discover common patterns.
e.g. look for common behaviours or data validation sets and find the business language term that describes this commonality. e.g. You might find RobberyIncident and FireIncident both apply to Buildings.
If using inheritence you might create a BuildingIncident and RobberyIncident and FireIncident would extend BuildingIncident.
If using composition, you might create a valueobject to represent a Building and both RobberyIncident and FireIncident would contain a Building property. However RobberyIncident would also contain a Robbery property and FireIncident would also contain a Fire property. CarAccidentIncident and CarRobberyIncident would both contain a Car property, but CarRobberyIncident would also contain a Robbery property of the same type as the Robbery property on RobberyIncident - assuming they are truly common behaviours.
You may still have hundreds of classes representing specialised incident types, but they are simply composed of a set of value object properties representing the set of common patterns they are composed of - and those value objects can and should be in terms of ubiquitious language concepts.
My take on this is that not all information is pertinent to the domain.
I think that in many instances we try to apply techniques in an "all-or-nothing" approach whereas we may need to be focusing on the "right tool for the job". In the answer provided by Chris he asks the question "When to use DDD?" and mentions "The thing is, DDD is not necessarily the right fit for all systems." I would argue that DDD may not be appropriate for some parts of a system.
Would DDD be useful to create, say, a word processing application? I don't really think so. Although some good old proper OO would go a long way.
DDD is absolutely great for business behaviour focused bits of a system. However, there are going to be bits that can be modeled in a more technical/generic way that feed into more interesting business functionality. I'm sure that those incidents end up in some business process. An example may be a Claim. The business is very interested in tracking a claim and the claim amount, but where that claim came from isn't all too interesting. For all intents and purposes the "initiating documentation" may be filled in using pen and paper and scanned in to be linked to said claim. One could even start a new claim on the system using a plain text input.
I have been involved in a number of systems where a lot of peripheral data was sucked into the system but actually it wasn't really contributing much (law of diminishing returns and such).
I once worked on a loan system. The original 20 year-old system was re-written in C#. The main moving bits:
Client
Loan Amount
Payment schedule
Financial transactions (interest, payments, etc.)
All-in-all it is really a simple system. Well, 800+ tables later and stacks of developers/BAs and the system is somewhat of a monster. One could even capture stock and title deeds as guarantee. Now, my take would be to scan in some of this information and link it to the loan. However, somehow some business folks decide that they absolutely "must have" this information in the system. It isn't core though, I would say.
On the other end, another system I worked on calculated premiums. It was modeled quite business-like and was quite a maintenance nightmare. It was then re-written very generically by simply defining calculations that work on given inputs. There were some lookup tables for values and so on but no business processing as such.
Sometimes we may need to abstract moving bits into something that makes sense as an input or output and then use that in our domain. I think the UL should be used by ourselves and domain experts but it doesn't mean that we are not going to end up using technical concepts that are not part of the UL, and I think that that is okay. I'm sure a domain expert wouldn't care much for a SqlDbConnection even though we are going to using one of those in our code :) --- similarly we could model some structures outside of the domain proper.
In response to your update and question: I would not create a concrete class unless it really does feature in the UL in a big way. On a side note, I still favour composition over inheritance. I typically implement interfaces where necessary and go with abstract classes when inheriting, just to place some default behaviour when it helps.
The UL, as with any design, represents a model with nuances. We can apply DDD without using domain events. When we do use domain events we may even go with event sourcing. Event sourcing has very little to do with the UL in much the same way that the terms "Aggregate", "Entity", or "Value Object" would. The UL is going to be specific to the domain / domain experts and when we, as domain modelers, talk to each other we can describe various models in terms of DDD tactical patterns in order to bring across some of the specific UL concepts.
We have to listen to how a domain expert describes the problem space. As soon as we hear "When", as stated in so many other places, we know that we are probably dealing with an event. In much the same way we can listen to how a domain expert talks about the aggregates. For instance (totally bogus example):
"When a customer is registered we need to inform the supervisor of the CSR that initiated the request"
More loosely related to your example:
"When an incident takes place we need to capture some specific details regarding the incident. Depending on the type we need to capture different bits and validate that we have sufficient data to process our claim
Between these two we can see a distinct difference in how they are referring to interacting with the problem space. When a domain expert thinks of something in very broad terms I think it is prudent that we do the same.
On the other hand, should the conversation be more along the lines of this:
"When a car accident is registered we need to assign an assessor an wait for an assessment report that has to answer..."
Now we have something much more specific. These are, necessarily, mutually exclusive in that if they only ever talk about specifics then we go with "specific". If they first mention in broad terms and then specifics, we can also work in broad terms.
This is where our modeling is tricky to get right. It is the same nuance as we have in the Address as an aggregate vs value object "debate". It all depends on the context.
These things are going to be tricky and dependent on the domain in order to get right. As Eric Evans did mention: it may take a couple of models to get something that fits just right. This is necessarily so based on one's experience with the domain.

Ruminations on highly-scalable and modular distributed server side architectures

Mine is not really a question, it's more of a call for opinions - and perhaps this isn't even the right place to post it. Nevertheless, the community here is very informed, and there's no harm in trying...
I was thinking about ways to create a highly scalable and, above all, highly modular back-end architecture. For example, an entire back-end ecosystem for a large site that had the potential for future-proof evolution into a massive site.
This would entail a very high degree of separation of concerns, to the extent that not only could (say) the underling DB be replaced (ie from Oracle to MySQL) but the actual type of database could be replaced (ed SQL to KV, or vice versa).
I envision a situation where each sub-system exposes its own API within the back-end ecosystem. In this way, the API could remain constant, whilst the implementation could change (even radically) over time.
The system must be heterogeneous in that it's not tied to a specific language. It must be able to accommodate modules or entire sub-systems using different languages.
It then occurred to me that what I was imagining was simply the architecture of the web itself.
So here is my discussion point: apart from the overhead of using (mainly) text-based protocols is there any overriding reason why a complex back-end architecture should not be implemented in the manner I describe, or is there some strong rationale I'm missing for using communication protocols such as Twisted, AMQP, Thrift, etc?
UPDATE: Following a comment from #meagar, I should perhaps reformulate the question: are the clear advantages of using a very simple, flexible and well-understood architecture (ie all functionality exposed as a series RESTful APIs) enough to compensate the obvious performance hit incurred when using this architecture in a back-end context?
[code]the actual type of database could be replaced (ed SQL to KV, or vice versa).[/code]
And anyone who wrote a join between two tables will be sad. If you want the "ability" to switch to KV, then you should not expose an API richer than what KV can support.
The answer to your question depends on what it is you're trying to accomplish. You want to keep each module within reasonable reins. Use proper physical layering of code, use defined interfaces with side-effect contracts, use test cases for each success and failure case of each interface. That way, you can depend on things like "when user enters blah page, a user-blah fact is generated so that all registered fact listeners will be invoked." This allows you to extend the system without having direct calls from point A to point B, while still having some kind of control over widely disparate dependencies. (I hate code bases where you can't find-all to find all possible references to a symbol!)
However, the fact that we put lots of code and classes into a single system is because calling between systems is often very, very expensive. You want to think in terms of code modules making requests of each other where you can. The difference in timing between a function call and a REST call is something like one to a million (maybe you can get it as low as one to ten thousand, if you only count cycles, not wallclock time -- but I'm not so sure). Also, anything that goes on a wire in a datacenter may potentially suffer from packet loss, because there is no such thing as a 100% loss-free data center, no matter how hard you try. Packet loss means random latency spikes in the response time for your application.

Significant Challengers to OOP

From what I understand, OOP is the most commonly used paradigm for large scale projects. I also know that some smaller subsets of big systems use other paradigms (e.g. SQL, which is declarative), and I also realize that at lower levels of computing OOP isn't really feasible. But it seems to me that usually the pieces of higher level solutions are almost always put together in a OOP fashion.
Are there any scenarios where a truly non-OOP paradigm is actually a better choice for a largescale solution? Or is that unheard of these days?
I've wondered this ever since I've started studying CS; it's easy to get the feeling that OOP is some nirvana of programming that will never be surpassed.
In my opinion, the reason OOP is used so widely isn't so much that it's the right tool for the job. I think it's more that a solution can be described to the customer in a way that they understand.
A CAR is a VEHICLE that has an ENGINE. That's programming and real world all in one!
It's hard to comprehend anything that can fit the programming and real world quite so elegantly.
Linux is a large-scale project that's very much not OOP. And it wouldn't have a lot to gain from it either.
I think OOP has a good ring to it, because it has associated itself with good programming practices like encapsulation, data hiding, code reuse, modularity et.c. But these virtues are by no means unique to OOP.
You might have a look at Erlang, written by Joe Armstrong.
Wikipedia:
"Erlang is a general-purpose
concurrent programming language and
runtime system. The sequential subset
of Erlang is a functional language,
with strict evaluation, single
assignment, and dynamic typing."
Joe Armstrong:
“Because the problem with
object-oriented languages is they’ve
got all this implicit environment that
they carry around with them. You
wanted a banana but what you got was a
gorilla holding the banana and the
entire jungle.”
The promise of OOP was code reuse and easier maintenance. I am not sure it delivered. I see things such as dot net as being much the same as the C libraries we used to get fro various vendors. You can call that code reuse if you want. As for maintenance bad code is bad code. OOP did not help.
I'm the biggest fan of OOP, and I practice OOP every day.
It's the most natural way to write code, because it resembles the real life.
Though, I realize that the OOP's virtualization might cause performance issues.
Of course that depends on your design, the language and the platform you chose (systems written in Garbage collection based languages such as Java or C# might perform worse than systems which were written in C++ for example).
I guess in Real-time systems, procedural programming may be more appropriate.
Note that not all projects that claim to be OOP are in fact OOP. Sometimes the majority of the code is procedural, or the data model is anemic, and so on...
Zyx, you wrote, "Most of the systems use relational databases ..."
I'm afraid there's no such thing. The relational model will be 40 years old next year and has still never been implemented. I think you mean, "SQL databases." You should read anything by Fabian Pascal to understand the difference between a relational dbms and an SQL dbms.
" ... the relational model is usually chosen due to its popularity,"
True, it's popular.
" ... availability of tools,"
Alas without the main tool necessary: an implementation of the relational model.
" support,"
Yup, the relational model has fine support, I'm sure, but it's entirely unsupported by a dbms implementation.
" and the fact that the relational model is in fact a mathematical concept,"
Yes, it's a mathematical concept, but, not being implemented, it's largely restricted to the ivory towers. String theory is also a mathematical concept but I wouldn't implement a system with it.
In fact, despite it's being a methematical concept, it is certainly not a science (as in computer science) because it lacks the first requirement of any science: that it is falsifiable: there's no implementation of a relational dbms against which we can check its claims.
It's pure snake oil.
" ... contrary to OOP."
And contrary to OOP, the relational model has never been implemented.
Buy a book on SQL and get productive.
Leave the relational model to unproductive theorists.
See this and this. Apparently you can use C# with five different programming paradigms, C++ with three, etc.
Software construction is not akin to Fundamental Physics. Physics strive to describe reality using paradigms which may be challenged by new experimental data and/or theories. Physics is a science which searches for a "truth", in a way that Software construction doesn't.
Software construction is a business. You need to be productive, i.e. to achieve some goals for which someone will pay money. Paradigms are used because they are useful to produce software effectively. You don't need everyone to agree. If I do OOP and it's working well for me, I don't care if a "new" paradigm would potentially be 20% more useful to me if I had the time and money to learn it and later rethink the whole software structure I'm working in and redesign it from scratch.
Also, you may be using another paradigm and I'll still be happy, in the same way that I can make money running a Japanese food restaurant and you can make money with a Mexican food restaurant next door. I don't need to discuss with you whether Japanese food is better than Mexican food.
I doubt OOP is going away any time soon, it just fits our problems and mental models far too well.
What we're starting to see though is multi-paradigm approaches, with declarative and functional ideas being incorporated into object oriented designs. Most of the newer JVM languages are a good example of this (JavaFX, Scala, Clojure, etc.) as well as LINQ and F# on the .net platform.
It's important to note that I'm not talking about replacing OO here, but about complementing it.
JavaFX has shown that a declarative
solution goes beyond SQL and XSLT,
and can also be used for binding
properties and events between visual
components in a GUI
For fault tolerant and highly
concurrent systems, functional
programming is a very good fit,
as demonstrated by the Ericsson
AXD301 (programmed using Erlang)
So... as concurrency becomes more important and FP becomes more popular, I imagine that languages not supporting this paradigm will suffer. This includes many that are currently popular such as C++, Java and Ruby, though JavaScript should cope very nicely.
Using OOP makes the code easier to manage (as in modify/update/add new features) and understand. This is especially true with bigger projects. Because modules/objects encapsulate their data and operations on that data it is easier to comprehend the functionality and the big picture.
The benefit of OOP is that it is easier to discuss (with other developers/management/customer) a LogManager or OrderManager, each of which encompass specific functionality, then describing 'a group of methods that dump the data in file' and 'the methods that keep track of order details'.
So I guess OOP is helpful especially with big projects but there are always new concepts turning up so keep on lookout for new stuff in the future, evaluate and keep what is useful.
People like to think of various things as "objects" and classify them, so no doubt that OOP is so popular. However, there are some areas where OOP has not gained a bigger popularity. Most of the systems use relational databases rather than objective. Even if the second ones hold some notable records and are better for some types of tasks, the relational model is unsually chosen due to its popularity, availability of tools, support and the fact that the relational model is in fact a mathematical concept, contrary to OOP.
Another area where I have never seen OOP is the software building process. All the configuration and make scripts are procedural, partially because of the lack of the support for OOP in shell languages, partially because OOP is too complex for such tasks.
Slightly controversial opinion from me but I don't find OOP, at least of a kind that is popularly applied now, to be that helpful in producing the largest scale software in my particular domain (VFX, which is somewhat similar in scene organization and application state as games). I find it very useful on a medium to smaller scale. I have to be a bit careful here since I've invited some mobs in the past, but I should qualify that this is in my narrow experience in my particular type of domain.
The difficulty I've often found is that if you have all these small concrete objects encapsulating data, they now want to all talk to each other. The interactions between them can get extremely complex, like so (except much, much more complex in a real application spanning thousands of objects):
And this is not a dependency graph directly related to coupling so much as an "interaction graph". There could be abstractions to decouple these concrete objects from each other. Foo might not talk to Bar directly. It might instead talk to it through IBar or something of this sort. This graph would still connect Foo to Bar since, albeit being decoupled, they still talk to each other.
And all this communication between small and medium-sized objects which make up their own little ecosystem, if applied to the entire scale of a large codebase in my domain, can become extremely difficult to maintain. And it becomes so difficult to maintain because it's hard to reason about what happens with all these interactions between objects with respect to things like side effects.
Instead what I've found useful is to organize the overall codebase into completely independent, hefty subsystems that access a central "database". Each subsystem then inputs and outputs data. Some other subsystems might access the same data, but without any one system directly talking to each other.
... or this:
... and each individual system no longer attempts to encapsulate state. It doesn't try to become its own ecosystem. It instead reads and writes data in the central database.
Of course in the implementation of each subsystem, they might use a number of objects to help implement them. And that's where I find OOP very useful is in the implementation of these subsystems. But each of these subsystems constitutes a relatively medium to small-scale project, not too large, and it's at that medium to smaller scale that I find OOP very useful.
"Assembly-Line Programming" With Minimum Knowledge
This allows each subsystem to just focus on doing its thing with almost no knowledge of what's going on in the outside world. A developer focusing on physics can just sit down with the physics subsystem and know little about how the software works except that there's a central database from which he can retrieve things like motion components (just data) and transform them by applying physics to that data. And that makes his job very simple and makes it so he can do what he does best with the minimum knowledge of how everything else works. Input central data and output central data: that's all each subsystem has to do correctly for everything else to work. It's the closest thing I've found in my field to "assembly line programming" where each developer can do his thing with minimum knowledge about how the overall system works.
Testing is still also quite simple because of the narrow focus of each subsystem. We're no longer mocking concrete objects with dependency injection so much as generating a minimum amount of data relevant to a particular system and testing whether the particular system provides the correct output for a given input. With so few systems to test (just dozens can make up a complex software), it also reduces the number of tests required substantially.
Breaking Encapsulation
The system then turns into a rather flat pipeline transforming central application state through independent subsystems that are practically oblivious to each other's existence. One might sometimes push a central event to the database which another system processes, but that other system is still oblivious about where that event came from. I've found this is the key to tackling complexity at least in my domain, and it is effectively through an entity-component system.
Yet it resembles something closer to procedural or functional programming at the broad scale to decouple all these subsystems and let them work with minimal knowledge of the outside world since we're breaking encapsulation in order to achieve this and avoid requiring the systems to talk to each other. When you zoom in, then you might find your share of objects being used to implement any one of these subsystems, but at the broadest scale, the systems resembles something other than OOP.
Global Data
I have to admit that I was very hesitant about applying ECS at first to an architectural design in my domain since, first, it hadn't been done before to my knowledge in popular commercial competitors (3DS Max, SoftImage, etc), and second, it looks like a whole bunch of globally-accessible data.
I've found, however, that this is not a big problem. We can still very effectively maintain invariants, perhaps even better than before. The reason is due to the way the ECS organizes everything into systems and components. You can rest assured that an audio system won't try to mutate a motion component, e.g., not even under the hackiest of situations. Even with a poorly-coordinated team, it's very improbable that the ECS will degrade into something where you can no longer reason about which systems access which component, since it's rather obvious on paper and there are virtually no reasons whatsoever for a certain system to access an inappropriate component.
To the contrary it often removed many of the former temptations for hacky things with the data wide open since a lot of the hacky things done in our former codebase under loose coordination and crunch time was done in hasty attempts to x-ray abstractions and try to access the internals of the ecosystems of objects. The abstractions started to become leaky as a result of people, in a hurry, trying to just get and do things with the data they wanted to access. They were basically jumping through hoops trying to just access data which lead to interface designs degrading quickly.
There is something vaguely resembling encapsulation still just due to the way the system is organized since there's often only one system modifying a particular type of components (two in some exceptional cases). But they don't own that data, they don't provide functions to retrieve that data. The systems don't talk to each other. They all operate through the central ECS database (which is the only dependency that has to be injected into all these systems).
Flexibility and Extensibility
This is already widely-discussed in external resources about entity-component systems but they are extremely flexible at adapting to radically new design ideas
in hindsight, even concept-breaking ones like a suggestion for a creature which is a mammal, insect, and plant that sprouts leaves under sunlight all at once.
One of the reasons is because there are no central abstractions to break. You introduce some new components if you need more data for this or just create an entity which strings together the components required for a plant, mammal, and insect. The systems designed to process insect, mammal, and plant components then automatically pick it up and you might get the behavior you want without changing anything besides adding a line of code to instantiate an entity with a new combo of components. When you need whole new functionality, you just add a new system or modify an existing one.
What I haven't found discussed so much elsewhere is how much this eases maintenance even in scenarios when there are no concept-breaking design changes that we failed to anticipate. Even ignoring the flexibility of the ECS, it can really simplify things when your codebase reaches a certain scale.
Turning Objects Into Data
In a previous OOP-heavy codebase where I saw the difficulty of maintaining a codebase closer to the first graph above, the amount of code required exploded because the analogical Car in this diagram:
... had to be built as a completely separate subtype (class) implementing multiple interfaces. So we had an explosive number of objects in the system: a separate object for point lights from directional lights, a separate object for a fish eye camera from another, etc. We had thousands of objects implementing a few dozen abstract interfaces in endless combinations.
When I compared it to ECS, that required only hundreds and we were able to do the exact same things before using a small fraction of the code, because that turned the analogical Car entity into something that no longer requires its class. It turns into a simple collection of component data as a generalized instance of just one Entity type.
OOP Alternatives
So there are cases like this where OOP applied in excess at the broadest level of the design can start to really degrade maintainability. At the broadest birds-eye view of your system, it can help to flatten it and not try to model it so "deep" with objects interacting with objects interacting with objects, however abstractly.
Comparing the two systems I worked on in the past and now, the new one has more features but takes hundreds of thousands of LOC. The former required over 20 million LOC. Of course it's not the fairest comparison since the former one had a huge legacy, but if you take a slice of the two systems which are functionally quite equal without the legacy baggage (at least about as close to equal as we might get), the ECS takes a small fraction of the code to do the same thing, and partly because it dramatically reduces the number of classes there are in the system by turning them into collections (entities) of raw data (components) with hefty systems to process them instead of a boatload of small/medium objects.
Are there any scenarios where a truly non-OOP paradigm is actually a
better choice for a largescale solution? Or is that unheard of these
days?
It's far from unheard of. The system I'm describing above, for example, is widely used in games. It's quite rare in my field (most of the architectures in my field are COM-like with pure interfaces, and that's the type of architecture I worked on in the past), but I've found that peering over at what gamers are doing when designing an architecture made a world of difference in being able to create something that still remains very comprehensible at it grows and grows.
That said, some people consider ECS to be a type of object-oriented programming on its own. If so, it doesn't resemble OOP of a kind most of us would think of, since data (components and entities to compose them) and functionality (systems) are separated. It requires abandoning encapsulation at the broad system level which is often considered one of the most fundamental aspects of OOP.
High-Level Coding
But it seems to me that usually the pieces of higher level solutions
are almost always put together in a OOP fashion.
If you can piece together an application with very high-level code, then it tends to be rather small or medium in scale as far as the code your team has to maintain and can probably be assembled very effectively using OOP.
In my field in VFX, we often have to do things that are relatively low-level like raytracing, image processing, mesh processing, fluid dynamics, etc, and can't just piece these together from third party products since we're actually competing more in terms of what we can do at the low-level (users get more excited about cutting-edge, competitive production rendering improvements than, say, a nicer GUI). So there can be lots and lots of code ranging from very low-level shuffling of bits and bytes to very high-level code that scripters write through embedded scripting languages.
Interweb of Communication
But there comes a point with a large enough scale with any type of application, high-level or low-level or a combo, that revolves around a very complex central application state where I've found it no longer useful to try to encapsulate everything into objects. Doing so tends to multiply complexity and the difficulty to reason about what goes on due to the multiplied amount of interaction that goes on between everything. It no longer becomes so easy to reason about thousands of ecosystems talking to each other if there isn't a breaking point at a large enough scale where we stop modeling each thing as encapsulated ecosystems that have to talk to each other. Even if each one is individually simple, everything taken in as a whole can start to more than overwhelm the mind, and we often have to take a whole lot of that in to make changes and add new features and debug things and so forth if you try to revolve the design of an entire large-scale system solely around OOP principles. It can help to break free of encapsulation at some scale for at least some domains.
At that point it's not necessarily so useful anymore to, say, have a physics system encapsulate its own data (otherwise many things could want to talk to it and retrieve that data as well as initialize it with the appropriate input data), and that's where I found this alternative through ECS so helpful, since it turns the analogical physics system, and all such hefty systems, into a "central database transformer" or a "central database reader which outputs something new" which can now be oblivious about each other. Each system then starts to resemble more like a process in a flat pipeline than an object which forms a node in a very complex graph of communication.

Why do we need entity objects? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
I really need to see some honest, thoughtful debate on the merits of the currently accepted enterprise application design paradigm.
I am not convinced that entity objects should exist.
By entity objects I mean the typical things we tend to build for our applications, like "Person", "Account", "Order", etc.
My current design philosophy is this:
All database access must be accomplished via stored procedures.
Whenever you need data, call a stored procedure and iterate over a SqlDataReader or the rows in a DataTable
(Note: I have also built enterprise applications with Java EE, java folks please substitute the equvalent for my .NET examples)
I am not anti-OO. I write lots of classes for different purposes, just not entities. I will admit that a large portion of the classes I write are static helper classes.
I am not building toys. I'm talking about large, high volume transactional applications deployed across multiple machines. Web applications, windows services, web services, b2b interaction, you name it.
I have used OR Mappers. I have written a few. I have used the Java EE stack, CSLA, and a few other equivalents. I have not only used them but actively developed and maintained these applications in production environments.
I have come to the battle-tested conclusion that entity objects are getting in our way, and our lives would be so much easier without them.
Consider this simple example: you get a support call about a certain page in your application that is not working correctly, maybe one of the fields is not being persisted like it should be. With my model, the developer assigned to find the problem opens exactly 3 files. An ASPX, an ASPX.CS and a SQL file with the stored procedure. The problem, which might be a missing parameter to the stored procedure call, takes minutes to solve. But with any entity model, you will invariably fire up the debugger, start stepping through code, and you may end up with 15-20 files open in Visual Studio. By the time you step down to the bottom of the stack, you forgot where you started. We can only keep so many things in our heads at one time. Software is incredibly complex without adding any unnecessary layers.
Development complexity and troubleshooting are just one side of my gripe.
Now let's talk about scalability.
Do developers realize that each and every time they write or modify any code that interacts with the database, they need to do a throrough analysis of the exact impact on the database? And not just the development copy, I mean a mimic of production, so you can see that the additional column you now require for your object just invalidated the current query plan and a report that was running in 1 second will now take 2 minutes, just because you added a single column to the select list? And it turns out that the index you now require is so big that the DBA is going to have to modify the physical layout of your files?
If you let people get too far away from the physical data store with an abstraction, they will create havoc with an application that needs to scale.
I am not a zealot. I can be convinced if I am wrong, and maybe I am, since there is such a strong push towards Linq to Sql, ADO.NET EF, Hibernate, Java EE, etc. Please think through your responses, if I am missing something I really want to know what it is, and why I should change my thinking.
[Edit]
It looks like this question is suddenly active again, so now that we have the new comment feature I have commented directly on several answers. Thanks for the replies, I think this is a healthy discussion.
I probably should have been more clear that I am talking about enterprise applications. I really can't comment on, say, a game that's running on someone's desktop, or a mobile app.
One thing I have to put up here at the top in response to several similar answers: orthogonality and separation of concerns often get cited as reasons to go entity/ORM. Stored procedures, to me, are the best example of separation of concerns that I can think of. If you disallow all other access to the database, other than via stored procedures, you could in theory redesign your entire data model and not break any code, so long as you maintained the inputs and outputs of the stored procedures. They are a perfect example of programming by contract (just so long as you avoid "select *" and document the result sets).
Ask someone who's been in the industry for a long time and has worked with long-lived applications: how many application and UI layers have come and gone while a database has lived on? How hard is it to tune and refactor a database when there are 4 or 5 different persistence layers generating SQL to get at the data? You can't change anything! ORMs or any code that generates SQL lock your database in stone.
I think it comes down to how complicated the "logic" of the application is, and where you have implemented it. If all your logic is in stored procedures, and all your application does is call those procedures and display the results, then developing entity objects is indeed a waste of time. But for an application where the objects have rich interactions with one another, and the database is just a persistence mechanism, there can be value to having those objects.
So, I'd say there is no one-size-fits-all answer. Developers do need to be aware that, sometimes, trying to be too OO can cause more problems than it solves.
Theory says that highly cohesive, loosely coupled implementations are the way forward.
So I suppose you are questioning that approach, namely separating concerns.
Should my aspx.cs file be interacting with the database, calling a sproc, and understanding IDataReader?
In a team environment, especially where you have less technical people dealing with the aspx portion of the application, I don't need these people being able to "touch" this stuff.
Separating my domain from my database protects me from structural changes in the database, surely a good thing? Sure database efficacy is absolutely important, so let someone who is most excellent at that stuff deal with that stuff, in one place, with as little impact on the rest of the system as possible.
Unless I am misunderstanding your approach, one structural change in the database could have a large impact area with the surface of your application. I see that this separation of concerns enables me and my team to minimise this. Also any new member of the team should understand this approach better.
Also, your approach seems to advocate the business logic of your application to reside in your database? This feels wrong to me, SQL is really good at querying data, and not, imho, expressing business logic.
Interesting thought though, although it feels one step away from SQL in the aspx, which from my bad old unstructured asp days, fills me with dread.
One reason - separating your domain model from your database model.
What I do is use Test Driven Development so I write my UI and Model layers first and the Data layer is mocked, so the UI and model is build around domain specific objects, then later I map these objects to what ever technology I'm using the the Data Layer. Its a bad idea to let the database structure determine the design of your application. Where possible write the app first and let that influence the structure of your database, not the other way around.
For me it boils down to I don't want my application to be concerned with how the data is stored. I'll probably get slapped for saying this...but your application is not your data, data is an artifact of the application. I want my application to be thinking in terms of Customers, Orders and Items, not a technology like DataSets, DataTables and DataRows...cuz who knows how long those will be around.
I agree that there is always a certain amount of coupling, but I prefer that coupling to reach upwards rather than downwards. I can tweak the limbs and leaves of a tree easier than I can alter it's trunk.
I tend to reserve sprocs for reporting as the queries do tend to get a little nastier than the applications general data access.
I also tend to think with proper unit testing early on that scenario's like that one column not being persisted is likely not to be a problem.
Eric,
You are dead on. For any really scalable / easily maintained / robust application the only real answer is to dispense with all the garbage and stick to the basics.
I've followed a similiar trajectory with my career and have come to the same conclusions. Of course, we're considered heretics and looked at funny. But my stuff works and works well.
Every line of code should be looked at with suspicion.
I would like to answer with an example similar to the one you proposed.
On my company I had to build a simple CRUD section for products, I build all my entities and a separate DAL. Later another developer had to change a related table and he even renamed several fields. The only file I had to change to update my form was the DAL for that table.
What (in my opinion) entities brings to a project is:
Ortogonality: Changes in one layer might not affect other layers (off course if you make a huge change on the database it would ripple through all the layers but most small changes won't).
Testability: You can test your logic with out touching your database. This increases performance on your tests (allowing you to run them more frequently).
Separation of concerns: In a big product you can assign the database to a DBA and he can optimize the hell out of it. Assign the Model to a business expert that has the knowledge necessary to design it. Assign individual forms to developers more experienced on webforms etc..
Finally I would like to add that most ORM mappers support stored procedures since that's what you are using.
Cheers.
I think you may be "biting off more than you can chew" on this topic. Ted Neward was not being flippant when he called it the "Vietnam of Computer Science".
One thing I can absolutely guarantee you is that it will change nobody's point of view on the matter, as has been proven so often on innumerable other blogs, forums, podcasts etc.
It's certainly ok to have open disucssion and debate about a controversial topic, it's just this one has been done so many times that both "sides" have agreed to disagree and just got on with writing software.
If you want to do some further reading on both sides, see articles on Ted's blog, Ayende Rahein, Jimmy Nilson, Scott Bellware, Alt.Net, Stephen Forte, Eric Evans etc.
#Dan, sorry, that's not the kind of thing I'm looking for. I know the theory. Your statement "is a very bad idea" is not backed up by a real example. We are trying to develop software in less time, with less people, with less mistakes, and we want the ability to easily make changes. Your multi-layer model, in my experience, is a negative in all of the above categories. Especially with regards to making the data model the last thing you do. The physical data model must be an important consideration from day 1.
I found your question really interesting.
Usually I need entities objects to encapsulate the business logic of an application. It would be really complicated and inadequate to push this logic into the data layer.
What would you do to avoid these entities objects? What solution do you have in mind?
Entity Objects can facilitate cacheing on the application layer. Good luck caching a datareader.
We should also talk about the notion what entities really are.
When I read through this discussion, I get the impression that most people here are looking at entities in the sense of an Anemic Domain Model.
A lot of people are considering the Anemic Domain Model as an antipattern!
There is value in rich domain models. That is what Domain Driven Design is all about.
I personally believe that OO is a way to conquer complexity. This means not only technical complexity (like data-access, ui-binding, security ...) but also complexity in the business domain!
If we can apply OO techniques to analyze, model, design and implement our business problems, this is a tremendous advantage for maintainability and extensibility of non-trivial applications!
There are differences between your entities and your tables. Entities should represent your model, tables just represent the data-aspect of your model!
It is true that data lives longer than apps, but consider this quote from David Laribee: Models are forever ... data is a happy side effect.
Some more links on this topic:
Why Setters and Getters are evil
Return of pure OO
POJO vs. NOJO
Super Models Part 2
TDD, Mocks and Design
Really interesting question. Honestly I can not prove why entities are good. But I can share my opinion why I like them. Code like
void exportOrder(Order order, String fileName){...};
is not concerned where order came from - from DB, from web request, from unit test, etc. It makes this method more explicitly declare what exactly it requires, instead of taking DataRow and documenting which columns it expects to have and which types they should be. Same applies if you implement it somehow as stored procedure - you still need to push record id to it, while it not necessary should be present in DB.
Implementation of this method would be done based on Order abstraction, not based on how exactly it is presented in DB. Most of such operations which I implemented really do not depend on how this data is stored. I do understand that some operations require coupling with DB structure for perfomance and scalability purposes, just in my experience there are not too much of them. In my experience very often it is enough to know that Person has .getFirstName() returning String, and .getAddress() returning Address, and address has .getZipCode(), etc - and do not care which tables are involed to store that data.
If you have to deal with such problems as you described, like when additional column breaks report perfomance, then for your tasks DB is a critical part, and you indeed should be as close as possible to it. While entities can provide some convenient abstractions they can hide some important details as well.
Scalability is interesting point here - most of websites which require enormous scalability (like facebook, livejournal, flickr) tend to use DB-ascetic approach, when DB is used as rare as possible and scalability issues are solved by caching, especially by RAM usage. http://highscalability.com/ has some interesting articles on it.
There are other good reasons for entity objects besides abstraction and loose coupling. One of the things I like most is the strong typing that you can't get with a DataReader or a DataTable. Another reason is that when done well, proper entity classes can make the code more maintanable by using first-class constructs for domain-specific terms that anyone looking at the code is likely to understand rather than a bunch of strings with field names in them used for indexing a DataRow. Stored procedures are really orthogonal to the use of an ORM since a lot of mapping frameworks give you the ability to map to sprocs.
I wouldn't consider sprocs + datareaders a substitute for a good ORM. With stored procedures, you're still constrained by, and tightly-coupled to, the procedure's type signature, which uses a different type system than the calling code. Stored procedures can be subject to modification to acommodate additional options and schema changes. An alternative to stored procedures in the case where the schema is subject to change is to use views--you can map objects to views and then re-map views to the underlying tables when you change them.
I can understand your aversion to ORMs if your experience mainly consists of Java EE and CSLA. You might want to have a look at LINQ to SQL, which is a very lightweight framework and is primarily a one-to-one mapping with the database tables but usually only needs minor extension for them to be full-blown business objects. LINQ to SQL can also map input and output objects to stored procedures' paramaters and results.
The ADO.NET Entity framework has the added advantage that your database tables can be viewed as entity classes inheriting from each other, or as columns from multiple tables aggregated into a single entity. If you need to change the schema, you can change the mapping from the conceptual model to the storage schema without changing the actual application code. And again, stored procedures can be used here.
I think that more IT projects in enterprises fail because of unmaintainability of the code or poor developer productivity (which can happen from, e.g., context switching between sproc-writing and app-writing) than scalability problems of an application.
I would also like to add to Dan's answer that separating both models could enable your application to be run on different database servers or even database models.
What if you need to scale your app by load balancing more than one web server? You could install the full app on all web servers, but a better solution is to have the web servers talk to an application server.
But if there aren't any entity objects, they won't have very much to talk about.
I'm not saying that you shouldn't write monoliths if its a simple, internal, short life application. But as soon as it gets moderately complex, or it should last a significant amount of time, you really need to think about a good design.
This saves time when it comes to maintaining it.
By splitting application logic from presentation logic and data access, and by passing DTOs between them, you decouple them. Allowing them to change independently.
You might find this post on comp.object interesting.
I'm not claiming to agree or disagree but it's interesting and (I think) relevant to this topic.
A question: How do you handle disconnected applications if all your business logic is trapped in the database?
In the type of Enterprise application I'm interested in, we have to deal with multiple sites, some of them must be able to function in a disconnected state.
If your business logic is encapsulated in a Domain layer that is simple to incorporate into various application types -say, as a dll- then I can build applications that are aware of the business rules and are able, when necessary, to apply them locally.
In keeping the Domain layer in stored procedures on the database you have to stick with a single type of application that needs a permanent line-of-sight to the database.
It's ok for a certain class of environments, but it certainly doesn't cover the whole spectrum of Enterprise applications.
#jdecuyper, one maxim I repeat to myself often is "if your business logic is not in your database, it is only a recommendation". I think Paul Nielson said that in one of his books. Application layers and UI come and go, but data usually lives for a very long time.
How do I avoid entity objects? Stored procedures mostly. I also freely admit that business logic tends to reach through all layers in an application whether you intend it to or not. A certain amount of coupling is inherent and unavoidable.
I have been thinking about this same thing a lot lately; I was a heavy user of CSLA for a while, and I love the purity of saying that "all of your business logic (or at least as much as is reasonably possible) is encapsulated in business entities".
I have seen the business entity model provide a lot of value in cases where the design of the database is different than the way you work with the data, which is the case in a lot of business software.
For example, the idea of a "customer" may consist of a main record in a Customer table, combined with all of the orders the customer has placed, as well as all the customer's employees and their contact information, and some of the properties of a customer and its children may be determined from lookup tables. It's really nice from a development standpoint to be able to work with the Customer as a single entity, since from a business perspective, the concept of Customer contains all of these things, and the relationships may or may not be enforced in the database.
While I appreciate the quote that "if your business rule is not in your database, it's only a suggestion", I also believe that you shouldn't design the database to enforce business rules, you should design it to be efficient, fast and normalized.
That said, as others have noted above, there is no "perfect design", the tool has to fit the job. But using business entities can really help with maintenance and productivity, since you know where to go to modify business logic, and objects can model real-world concepts in an intuitive way.
Eric,
No one is stopping you from choosing the framework/approach that you would wish. If you are going to go the "data driven/stored procedure-powered" path, then by all means, go for it! Especially if it really, really helps you deliver your applications on-spec and on-time.
The caveat being (a flipside to your question that is), ALL of your business rules should be on stored procedures, and your application is nothing more than a thin client.
That being said, same rules apply if you do your application in OOP : be consistent. Follow OOP's tenets, and that includes creating entity objects to represent your domain models.
The only real rule here is the word consistency. Nobody is stopping you from going DB-centric. No one is stopping you from doing old-school structured (aka, functional/procedural) programs. Hell, no one is stopping anybody from doing COBOL-style code. BUT an application has to be very, very consistent once going down this path, if it wishes to attain any degree of success.
I'm really not sure what you consider "Enterprise Applications". But I'm getting the impression you are defining it as an Internal Application where the RDBMS would be set in stone and the system wouldn't have to be interoperable with any other systems whether internal or external.
But what if you had a database with 100 tables which equate to 4 Stored Procedures for each table just for basic CRUD operations that's 400 stored procedures which need to be maintained and aren't strongly-typed so are susceptible to typos nor can be Unit Tested. What happens when you get a new CTO who is an Open Source Evangelist and wants to change the RDBMS from SQL Server to MySql?
A lot of software today whether Enterprise Applications or Products are using SOA and have some requirements for exposing Web Services, at least the software I am and have been involved with do.
Using your approach you would end up exposing a Serialized DataTable or DataRows. Now this may be deemed acceptable if the Client is guaranteed to be .NET and on an internal network. But when the Client is not known then you should be striving to Design an API which is intuitive and in most cases you would not want to be exposing the Full Database schema.
I certainly wouldn't want to explain to a Java developer what a DataTable is and how to use it. There's also the consideration of Bandwith and payload size and serialized DataTables, DataSets are very heavy.
There is no silver bullet with software design and it really depends on where the priorities lie, for me it's in Unit Testable code and loosely coupled components that can be easily consumed be any client.
just my 2 cents
I'd like to offer another angle to the problem of distance between OO and RDB: history.
Any software has a model of reality that is to some degree an abstraction of reality. No computer program can capture all the complexities of reality, and programs are written just to solve a set of problems from reality. Therefore any software model is a reduction of reality. Sometimes the software model forces reality to reduce itself. Like when you want the car rental company to reserve any car for you as long as it is blue and has alloys, but the operator can't comply because your request won't fit in the computer.
RDB comes from a very old tradition of putting information into tables, called accounting. Accounting was done on paper, then on punch cards, then in computers. But accounting is already a reduction of reality. Accounting has forced people to follow its system so long that it has become accepted reality. That's why it is relatively easy to make computer software for accounting, accounting has had its information model, long before the computer came along.
Given the importance of good accounting systems, and the acceptance you get from any business managers, these systems have become very advanced. The database foundations are now very solid and noone hesitates about keeping vital data in something so trustworthy.
I guess that OO must have come along when people have found that other aspects of reality are harder to model than accounting (which is already a model). OO has become a very successful idea, but persistance of OO data is relatively underdeveloped. RDB/Accounting has had easy wins, but OO is a much larger field (basically everything that isn't accounting).
So many of us have wanted to use OO but we still want safe storage of our data. What can be safer than to store our data the same way as the esteemed accounting system does? It is an enticing prospects, but we all run into the same pitfalls. Very few have taken the trouble to think of OO persistence compared to the massive efforts by the RDB industry, who has had the benefit of accounting's tradition and position.
Prevayler and db4o are some suggestions, I'm sure there are others I haven't heard of, but none have seemed to get half the press as, say, hibernation.
Storing your objects in good old files doesn't even seem to be taken seriously for multiuser applications, and especially web applications.
In my everyday struggle to close the chasm between OO and RDB I use OO as much as possible but try to keep inheritance to a minimum. I don't often use SPs. I'll use the advanced query stuff only in aspects that look like accounting.
I'll be happily supprised when the chasm is closed for good. I think the solution will come when Oracle launches something like "Oracle Object Instance Base". To really catch on, it will have to have a reassuring name.
Not a lot of time at the moment, but just off the top of my head...
The entity model lets you give a consistent interface to the database (and other possible systems) even beyond what a stored procedure interface can do. By using enterprise-wide business models you can make sure that all applications affect the data consistently which is a VERY important thing. Otherwise you end up with bad data, which is just plain evil.
If you only have one application then you don't really have an "enterprise" system, regardless of how big that application or your data are. In that case you can use an approach similar to what you talk about. Just be aware of the work that will be needed if you decide to grow your systems in the future.
Here are a few things that you should keep in mind (IMO) though:
Generated SQL code is bad
(exceptions to follow). Sorry, I
know that a lot of people think that
it's a huge time saver, but I've
never found a system that could
generate more efficient code than
what I could write and often the
code is just plain horrible. You
also often end up generating a ton
of SQL code that never gets used.
The exception here is very simple
patterns, like maybe lookup tables.
A lot of people get carried away on
it though.
Entities <> Tables (or even logical data model entities necessarily). A data model often has data rules that should be enforced as closely to the database as possible which can include rules around how table rows relate to each other or other similar rules that are too complex for declarative RI. These should be handled in stored procedures. If all of your stored procedures are simple CRUD procs, you can't do that. On top of that, the CRUD model usually creates performance issues because it doesn't minimize round trips across the network to the database. That's often the biggest bottleneck in an enterprise application.
Sometimes, your application and data layer are not that tightly coupled. For example, you may have a telephone billing application. You later create a separate application which monitors phone usage to a) better advertise to you b) optimise your phone plan.
These applications have different concerns and data requirements (even the data is coming out of the same database), they would drive different designs. Your code base can end up an absolute mess (in either application) and a nightmare to maintain if you let the database drive the code.
Applications that have domain logic separated from the data storage logic are adaptable to any kind of data source (database or otherwise) or UI (web or windows(or linux etc.)) application.
Your pretty much stuck in your database, which isn't bad if your with a company who is satisfied with the current database system your using. However, because databases evolve overtime there might be a new database system that is really neat and new that your company wants to use. What if they wanted to switch to a web services method of data access (like Service Orientated architecture sometime does). You might have to port your stored procedures all over the place.
Also the domain logic abstracts away the UI, which can be more important in large complex systems that have ever evolving UIs (especially when they are constantly searching for more customers).
Also, while I agree that there is no definitive answer to the question of stored procedures versus domain logic. I'm in the domain logic camp (and I think they are winning over time), because I believe that elaborate stored procedures are harder to maintain than elaborate domain logic. But that's a whole other debate
I think that you are just used to writing a specific kind of application, and solving a certain kind of problem. You seem to be attacking this from a "database first" perspective. There are lots of developers out there where data is persisted to a DB but performance is not a top priority. In lots of cases putting an abstraction over the persistence layer simplifies code greatly and the performance cost is a non-issue.
Whatever you are doing, it's not OOP. It's not wrong, it's just not OOP, and it doesn't make sense to apply your solutions to every othe problem out there.
Interesting question. A couple thoughts:
How would you unit test if all of your business logic was in your database?
Wouldn't changes to your database structure, specifically ones that affect several pages in your app, be a major hassle to change throughout the app?
Good Question!
One approach I rather like is to create an iterator/generator object that emits instances of objects that are relevant to a specific context. Usually this object wraps some underlying database access stuff, but I don't need to know that when using it.
For example,
An AnswerIterator object generates AnswerIterator.Answer objects. Under the hood it's iterating over a SQL Statement to fetch all the answers, and another SQL statement to fetch all related comments. But when using the iterator I just use the Answer object that has the minimum properties for this context. With a little bit of skeleton code this becomes almost trivial to do.
I've found that this works well when I have a huge dataset to work on, and when done right, it gives me small, transient objects that are relatively easy to test.
It's basically a thin veneer over the Database Access stuff, but it still gives me the flexibility of abstracting it when I need to.
The objects in my apps tend to relate one-to-one to the database, but I'm finding using Linq To Sql rather than sprocs makes it much easier writing complicated queries, especially being able to build them up using the deferred execution. e.g. from r in Images.User.Ratings where etc. This saves me trying to work out several join statements in sql, and having Skip & Take for paging also simplifies the code rather than having to embed the row_number & 'over' code.
Why stop at entity objects? If you don't see the value with entity objects in an enterprise level app, then just do your data access in a purely functional/procedural language and wire it up to a UI. Why not just cut out all the OO "fluff"?