When is a class too big or too small? [closed]

When is a class too big or too small? [closed] - oop

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I recently had my code reviewed by a senior developer in my company. He criticized my design for using too many classes. I am interested to hear your reactions.
I was tasked to create a service which produces an xml file as a result of manipulating 3 other xml files. Let's name these aaa.xml, bbb.xml and ccc.xml. The service works in two phases. In phase one it scrubs aaa.xml against bbb.xml. In the second phase, it merges the product of phase one with ccc.xml to produce a final result.
I chose a design with three classes: an XmlService class which used two other classes, a scrubber class and a merger class. I kept the scrubbing and merging classes separate because the both classes were large and featured distinct logic.
I thought my approach was good because it kept my classes small and cohesive. My approach also helped to control the size of my test class.
The senior developer asserted that the scrubbing and merging classes would only be used by the XmlService class, and should therefore be part of it. He felt this would make the XMLService cohesive and this is what being cohesive means according to him. He also feels that breaking up classes this way makes them loose cohesiveness.
The irony is I tried to break these classes to achieve cohesiveness. What do you think? Who is right or wrong? Are we both right? Thank you for your suggestions.

If you follow the single responsibility principle (and based on the tone of your question, I think you do follow it), then the answer is clear:
A class is too big when it does more than one thing; and
A class is too small when it fails to fulfill its purpose.
That's very broad and indeed subjective -- hence the struggle with your colleague. On your side, you can argue:
There's absolutely no problem in creating additional classes -- It's a non-issue, compile-wise and runtime-wise.
The current implementation of the service may suggest that these classes "belong" to it, but that may change.
You can test each functionality separately.
You can apply dependency injection.
You ease the cognitive load of understanding the inner working of the service, because its code is smaller and better organized.
Furthermore, I think your boss has a misguided understanding of cohesion. Think of it as focus: the narrower the focus of your program, the higher the cohesion. If the code on your satellite classes is merged within the service class, the latter becomes less focused (less cohesive). It's generally accepted that higher cohesion is preferred over lower cohesion. Try to enlighten his/her view about it.

Cohesion is a measure of how strongly related is the functionality within a body of code. That said, if merging and scrubbing aren't strongly related, including them both in the same class reduces cohesion.
The Single Responsibility Principle is also worth mentioning here. Creating a class for the sole purpose of scrubbing and another for the sole purpose of merging follows this principle.
I'd say your approach is the better of the two.

What would you name the classes? ScrubXml and MergeXml are nice names. ProcessXML and ScrubAndMergeXml aren't, the first being too general and the second having a conjunction. As long as none of the classes rely at all on the internals of one or the others (i.e., low coupling), you've got a good design there.
Names are very useful in determining cohesion. A cohesive module does one thing, and therefore has a simple specific name. A module that does more than one thing is less cohesive, and needs a more general or more complicated name.
One problem with merging functionality in X into Y if X is only used by Y is the reductio ad absurdam: if your call graph is acyclic, you'll wind up with all your functionality in one class.

As someone who is coming back from the GodClass building fest of several years in duration, and now trying very hard to avoid that mistake in the future; the error of making a 6000 to 10000 line single source unit with a single class, with over 250 methods, 200 data fields, and some methods over 100 lines, the single responsibility principle looks like something worth defending against the predilections of your unenlightened supervisor.
If however, your supervisor and you are disagreeing over a matter of whether 2000 lines of code belong in one class or three, then I think you're both closer to sane, than I was. Maybe it's a matter of scale and perspective. Some object oriented programming aficionados like a certain "Coefficient of Standalone" per class, in direct violation of the generally understood ideas about how to improve cohesion and minimize coupling.
A set of 3 classes that work well together is, objectively, a more object-oriented system, than a single class that does the same thing. one could argue that if you write an application using only one class, that the application is not really object oriented at all.

If the scrubber and merger are not meaningful outside the context of the main class, then I agree with your reviewer, particularly if you've had to expose any implementation details in order to allow this separation. If you're using a language supporting nested private classes or something similar, that might be a reasonable alternative to maintain the logical separation without exposing implementation details to outside consumers of the main class.

This is a very subjective area, and will be different depending on coding and style guidelines, and who approves your code.
That said, if your defense of your design didn't hold up, and your senior team member still insisted on merging your classes, then you have to compromise:
You've already got the logic separated out. Write the one service class and keep the methods separate like other good design, and then write a glue method. Add some comments above each method explaining how they could easily be partitioned to multiple classes if the need arises in the future.

Related

Why use classes instead of functions? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
I do know some advantages to classes such as variable and function scopes, but other than that is just seems easier to me to have groups of functions rather than to have many instances and abstractions of classes. So why is the "norm" to group similar functions in a class?

Simple, non-OOP programs may be one
long list of commands. More complex
programs will group lists of commands
into functions or subroutines each of
which might perform a particular task.
With designs of this sort, it is
common for the program's data to be
accessible from any part of the
program. As programs grow in size,
allowing any function to modify any
piece of data means that bugs can have
wide-reaching effects.
In contrast, the object-oriented
approach encourages the programmer to
place data where it is not directly
accessible by the rest of the program.
Instead the data is accessed by
calling specially written functions,
commonly called methods, which are
either bundled in with the data or
inherited from "class objects" and act
as the intermediaries for retrieving
or modifying those data. The
programming construct that combines
data with a set of methods for
accessing and managing those data is
called an object.
Advantages of OOP programming:
MAINTAINABILITY Object-oriented programming methods make code more maintainable. Identifying the source of errors is easier because objects are self-contained.
REUSABILITY Because objects contain both data and methods that act on data, objects can be thought of as self-contained black boxes. This feature makes it easy to reuse code in new systems.Messages provide a predefined interface to an object's data and functionality. With this interface, an object can be used in any context.
SCALABILITY Object-oriented programs are also scalable. As an object's interface provides a road map for reusing the object in new software, and provides all the information needed to replace the object without affecting other code. This way aging code can be replaced with faster algorithms and newer technology.

The point of OOP is not to 'group similar functions in a class'. If this is all you're doing then you're not doing OOP (despite using an OO language). Having classes instead of just a bunch of functions has a side effect of 'variable and function scopes' that you mention, but I see it just as a side effect.
OOP is about such concepts as encapsulation, inheritance, polymorphism, abstraction and many others. It is a specific way of software design, a specific way of mapping a problem to a software solution.

Grouping functions in a class is by no means the norm. Let me share some of the things I have learned through experimentation with different languages and paradigms.
I think the core concept here is that of a namespace. Namespaces are so useful that they are present in almost on any programming language.
Namespaces can help you overcome some common problems and model various patterns that appear in many domains, e.g., avoiding name collisions, hiding details, representing hierarchies, define access control, grouping related symbols (functions or data), define a context or scope ... and I'm sure there are more applications.
Classes are a type of namespace, and the specific properties of classes vary from language to language and sometimes from version to version of the same language, e.g., some provide access modifiers, some do not; some allow inheritance from multiple classes, others do not. People have been trying to find the magic mix of features that will be the most useful and that in part explains the plethora of available options in different programming languages.
So, why use classes, because they help solve certain kinds of problems in a way that seems natural or maybe intuitive. Every time we write a computer program we're trying to capture the essence of the problem and if the problem can be modeled by using some of the patterns mentioned above then it makes perfect sense to use those features of a language that help you do that.
As the problem becomes better understood you might realize that certain parts of the program could be better implemented by using a different paradigm/feature/pattern and then it's time for refactoring. Most programs I have had the chance to work on keep evolving until either the money/resources run out or when we arrive at the point of diminishing returns, many times you have something that's good enough for now and there's little incentive to keep working on it.

It's not the norm, it's just one way of doing it. Classes group methods (functions) AND data together, based on the concept of encapsulation.
For lager projects it often becomes easier to group things this way. Many people find it easier to conceptualizes the problem with objects.

There are many reasons to use classes, not the least of which is encapsulation of logic. Objects more closely match the world we live in, and are thus often more intuitive than other methodologies. Consider a car, a car has properties like body color, interior color, engine horsepower, features, current mileage, etc.. It also has methods, like Start (), TurnRight(.30), ApplyBrakes(.50). It has events like the ding when you open your car door with the keys in the ignition.
Probably the biggest reason is that most applications seem to have a graphical component these days and most of the libraries for graphical user interface are implemented with object models.
Polymorphism is probably a big reason, too. The ability to treat multiple types of objects generically is quite helpful.
If you are a mathematician, a functional style may be more intuitive, ML, F#. If you’re interacting with data in a predictable format, a declarative style would be better like SQL or LINQ.

In simple words, it seems to me that (apart from everything everyone is saying) that classes are best suited for large projects, especially those implemented by more than one programmer to facilitate keeping things tidy; as using functions in such situations can become rather cumbersome.
Otherwise, for simple programs/projects that you would implement yourself to do one thing or another, then functions would do nicely.

How to avoid creating huge classes [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
Stackoverflow users,
How do you keep yourself from creating large classes with large bodied methods. When deadlines are tight, you end up trying to hack things together and it ends up being a mess that needs refactoring.
One way is to start with test driven development and that lends itself to good class design as well as SRP (Single Responsibility Principle).
I also see developers double clicking on controls and typing out line after line in the event method that gets fired.
Any suggestions?

I guess it depends on your internal processes as much as anything.
In my company, we practise peer review, and all code that gets comitted must be 'buddied in' by another developer, who you have to explain your code to.
Time constraints are one thing, but if I review code that has heinously long classes, then I won't agree to the check-in.
It's hard to get used to at first, but at the end of the day, it's better for everyone.
Also, having a senior developer who is a champion for good class design, and is willing and able to give examples, helps tremendously.
Finally, we often do a coding 'show and tell' session where we get to show off our work to our peers, it behooves us not to do this with ugly code!

Use a tool like Resharper and the Extract Method command.

Long classes is one bad code smell of many possible.
Remedying overly large classes by creating lots of small ones may create its own problems. New engineers on your project may find it difficult to follow the flow of the code to work out what happens where. One artifact of this problem can be very tall call stacks, execution nesting through many small classes.

Another suggestion is to do only what is asked. Don't play the "What if" game and try to overdesign a solution. This has the "Keep it simple, stupid" idea behind it.

We're a java and maven shop, and one of the...I guess you could say forensic methods we use are the excellent FindBugs, PMD and javancss plugins. All will give warnings about excessive length in a method, and the cyclomatic complexity calculations can be very eye opening.

The single most important step for me to avoid large classes that often violate SRP was to use a simple dependency injection framework. This freed me from thinking too much about how to wire things together. I only use constructor injection to keep the design free from cycles. Tools like Resharper help to initialize fields from constructor arguments. This combination leads to a near zero overhead for creating and wiring up new classes, and implicitly encourages me to structure behavior in much more detail.
This all works best if data is kept separate from behavior, and your language supports constructs like events that can be used to decouple communication that flows in the downward direction of the dependency graph.

use some static code analysis tools in your automated builds and write/configure/use some rules so that for example someone has to write a justification when he/she breaks it..

Good definition for "coherence" [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 6 years ago.
Improve this question
I'm trying to tell someone his code is not "coherent" in the sense that it serves multiple purposes. I don't think I can explain it very well, so I'm looking for a good reference and/or definition.

I think the correct term is cohesion.
In computer programming, cohesion is a measure of how strongly-related and focused the various responsibilities of a software module are. Cohesion is an ordinal type of measurement and is usually expressed as "high cohesion" or "low cohesion" when being discussed.
Modules with high cohesion tend to be preferable because high cohesion is associated with several desirable traits of software including robustness, reliability, reusability, and understandability whereas low cohesion is associated with undesirable traits such as being difficult to maintain, difficult to test, difficult to reuse, and even difficult to understand.

I had Code Complete by Steve McConnell next to my computer (ie the programmers bible) with the page open explaining cohesion so I thought I'd share,
Cohesion arose from structured design
and is usually discussed in the same
context as coupling. Cohesion refers
to how closely all the routines in a
class or all the code in a routine
support a central purpose-how focused
the class is. Classes that contain
strongly related functionality are
described as having strong cohesion,
and the heuristic goal is to make
cohesion as strong as possible.

I use the term “separation of concerns” to explain this while refactoring. Often when code is fairly new, things will get lumped in together as the separate concerns are not clear at first.
One easy way to illustrate this to your co worker would be to ask them to write test cases for the code. This should illustrate that the code is not clear or coherent.
Another good phrase to use is that functions/objects “should do one thing, and do it well”, this has implications in everything from the object/method names to the overall architecture of the system.

In addition to the answers given so far, a simple way to think of high cohesion is lack of duplication of functionality, and clear seperation of related functionality into distinct modules, components or classes. Thus if you want a function similar to another function, and you cut and paste and subsequently modify a copy of the code, you are reducing cohesion. If you modify the the original to handle the new case, where the new case is clearly related to the existing functionality, you are increasing cohesion. Put another way, if your program has to do a given thing, no matter how times or in how many places, for maximum cohesion there should only be once piece of code that does that thing. At the same, a given class, module or component should have a single area of responsibility. Lumping unrelated functionality into a single class or component also reduces cohesion.
As CodeWiki says, cohesion is typically discussed with coupling, where the two can act in opposition to one another, particularly where strict interfaces aren't carefully planned. Many of the googled articles on cohesion relate to OO design, but cohesion and coupling are not restricted to OO.

Marked by an orderly, logical, and aesthetically consistent relation of parts; "a coherent argument" - from http://www.websters-online-dictionary.org/definition/coherent

Number of classes in large projects

I'm just wondering about large projects - say an airlines reservation system, how many classes/objects it is likely to have.
Objects: customer, airplane, airport, route, ticket, order. That's about all I can think of. The project will probably be hundreds of thousands of lines of code, so are there likely to be more classes (that do un-object related things)?
If so, how many classes are there likely to (roughly) be? How many objects (as per my list so not 10,000 customer objects, how many different names) are there likely to be?

There's really no magic formula for calculating optimal number of classes. The architecture you described above may create a very, very simple airline reservation system. As you continue to refactor, add more features, and accommodate special cases, you could end up with many more classes, e.g., MealPreference, CouponCode, Terminal, Gate, Airline, Baggage, BaggageTransfer, RainCheck, FlightUpgrade etc.
As you should (if you want to be agile), only code exactly what you need at the time, planning ahead for ease of extension. However, any project is going to grow in unanticipated ways over time.

For a real world airline reservation system? Thousands. Easily.
I'd guess around half of them are "infrastructure" classes - mostly related to persistence, logging, integration, etc. Maybe a few hundred domain classes (Airline, Airplane, Flight, Passenger, FrequentFlyer, MaintenanceSchedule, WeatherDelay, etc). And then another half of them would be UI related - controllers, views, view models, etc. to support both customer and internal apps.

The number of classes will be only statistical and it knowing it won't help you to establish any best practice, but I would say it can go to thousands of classes.
What it's important is to keep in mind the best practices and naming conventions, it is important to have a good package structure and name your classes according to its purpose, also keep in mind a high level of cohesion for your classes.
So other than satisfying your curiosity the number doesn't matter.

The number of classes is really relative to the design patterns and architecture you build.
For instance, with a simple console application that adds two numbers together, you can have anywhere from 1 to 10 (rough guess) based on the implementation and abstraction used.
You really can't judge how many "objects" an application is going to contain, without knowing the architecture and patterns.

I agree that it is very dependent on a lot of things; code language, application architecture, number of 3rd party extensions (which typically contain a lot of classes that don't actually ever get called by the app, but are included with .dll or .jar files), and a lot on the habit of the developer and if they tend to use giant monolithic classes, or break down everything into an interface, abstract, simple/stub, and real implementation.
However if your just looking for statistics, I was recently working on a large project for General Motors that had I'd say roughly 3000 classes, plus 200 JSP pages, and a few dozen CSS and JS files supporting the UI. Plugged into that you have DB drivers, Spring framework, a couple apache commons libraries, Hibernate ORM, Mule, JSTL, probably some other core components... the web server itself...

This is entirely dependent on the abstraction you use. In my experience, the closer you follow the problem domain (the closer you try to make your code read like what a business person would say), the more classes you'll have.

All the other answers you've received are wise: basically, "It depends."
But in another sense, there is an optimal number of classes you should have for a given number of methods; that is, given a number of methods, there is a specific number of classes (within a non-hierarchical encapsulation context, and to a close approximation) which minimises the potential structural complexity of those methods. (The actual number is given by the second law of encapsulation, and is equal to the square root of the number of methods divided by the number of public methods per class).
So then the question becomes: how many methods will you need? By analogy to the above, in which methods are encapsulated within classes, so blocks of code are encapsulation within methods, and so the number of methods is fixed by the same second law above.
So then the quesiton is: how many code blocks will I have?
That's answered by the other responses you've received: it depends.
For more, see Encapsulation Theory here:
www.EdmundKirwan.com
Regards,
Ed.

How do I break my procedural coding habits? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I recently read an interesting comment on an OOP related question in which one user objected to creating a "Manager" class:
Please remove the word manager
from your vocabulary when talking
about class names. The name of the
class should be descriptive of its'
purpose. Manager is just another word
for dumping ground. Any
functionality will fit there. The word
has been the cause of many extremely
bad designs
This comment embodies my struggle to become a good object-oriented developer. I have been doing procedural code for a long time at an organization with only procedural coders. It seems like the main strategy behind the relatively little OO code we produce is to break the problem down into classes that are easily identifiable as discrete units and then put the left over/generalized bits in a "Manager" class.
How can I break my procedural habits (like the Manager class)? Most OO articles/books, etc. use examples of problems that are inherently easy to transform into object groups (e.g., Vehicle -> Car) and thus do not provide much guidance for breaking down more complex systems.

First of all, I'd stop acting like procedural code is wrong. It's the right tool for some jobs. OO is also the right tool for some jobs. So is functional. Each paradigm is just a different point of view of computation, and exists because it's convenient for certain problems, not because it's the only right way to program. In principle, all three paradigms are mathematically equivalent, so use whichever one best maps to the problem domain. IMHO, if using a multiparadigm language it's even ok to blend paradigms within a module if different subproblems are best modeled by different worldviews.
Secondly, I'd read up on design patterns. It's hard to understand OO without some examples of the real-world problems it's good for solving. Head First Design Patterns is a good read, as it answers a lot of the "why" of OO.

Becoming good at OO takes years of practice and study of good OO code, ideally with a mentor. Remember that OO is just one means to an end. That being said, here are some general guidelines that work for me:
Favor composition over inheritance. Read and re-read the first chapter of the GoF book.
Obey the Law of Demeter ("tell, don't ask")
Try to use inheritance only to achieve polymorphism. When you extend one class from another, do so with the idea that you'll be invoking the behavior of that class through a reference to the base class. ALL the public methods of the base class should make sense for the subclass.
Don't get hung up on modeling. Build a working prototype to inform your design.
Embrace refactoring. Read the first few chapters of Fowler's book.

The single responsibility principle helps me break objects into manageable classes that make sense.
Each object should do one thing, and do it well without exposing how it works internally to other objects that need to use it.

A 'manager' class will often:
Interogate something's state
Make a decision based on that state
As an antidote or contrast to that, Object-Oriented design would encourage you to design class APIs where you "tell don't ask" the class itself to do things itself (and to encapsulate its own state): for more about "tell don't ask" see e.g. here and here (and maybe someone else has a better explanation of "tell don't ask" but these are first two articles that Google found for me).
It seems like the main strategy the little OO code we produce is to break the problem down into classes that are easily identifiable as discrete units and then put the left over/generalized bits in a "Manager" class.
That may well be true even at the best of times. Coplien talked about this towards the end of his Advanced C++: Programming Styles and Idioms book: he said that in a system, you tend to have:
Self-contained objects
And, "transactions", which act on other objects
Take, for example, an airplane (and I'm sorry for giving you another vehicular example; I'm paraphrasing him):
The 'objects' might include the ailerons, the rudder, and the thrust
The 'manager' or autpilot would implement various commands or transactions
For example, the "turn right" transaction includes:
flaps.right.up()
flaps.left.down()
rudder.right()
thrust.increase()
So I think it's true that you have transactions, which cut across or use the various relatively-passive 'objects'; in an application, for example, the "whatever" user-command will end up being implemented by (and therefore, invoking) various objects from every layer (e.g. the UI, the middle layer, and the DB layer).
So I think it's true that to a certain extent you will have 'bits left over'; it's a matter of degree though: perhaps you ought to want as much of the code as possible to be self-contained, and encapsulating, and everything ... and the bits left over, which use (or depend on) everything else, should be given/using an API which hides as much as possible and which does as much as possible, and which therefore takes as much responsibility (implementation details) as possible away from the so-called manager.
Unfortunately I've only read of this concept in that one book (Advanced C++) and can't link you to something online for a clearer explanation than this paraphrase of mine.

Reading and then practicing OO principles is what works for me. Head First Object-Oriented Analysis & Design works you through examples to make a solution that is OO and then ways to make the solution better.

You can learn good object-oriented design principles by studying design patterns. Code Complete 2 is a good book to read on the topic. Naturally, the best way to ingrain good programming principles into your mind is to practice them constantly by applying them to your own coding projects.

How can I break my procedural habits (like the Manager class)?
Make a class for what the manager is managing (for example, if you have a ConnectionManager class, make a class for a Connection). Move everything into that class.
The reason "manager" is a poor name in OOP is that one of the core ideas in OOP is that objects should manage themselves.
Don't be afraid to make small classes. Coming from a procedural background, you may think it isn't worth the effort to make a class unless it's a thousand lines of code and is some core concept in your domain. Think smaller. A ten line class is totally valid. Make little classes where you see they make sense (a Date, a MailingAddress) and then work your way up by composing classes out of those.
As you start to partition little pieces of your codebase into classes, the remaining procedural code soup will shrink. In that shrinking pool, you'll start to see other things that can be classes. Continue until the pool is empty.

How many OOP programmers does it take to change a light bulb?
None, the light bulb changes itself.
;)

You can play around with an OO language that has very bad procedural support like Smalltalk. The message sending paradigm will force you into OO thinking.

i think you should start it with a good plan.
planning using CLASS Diagrams would be a good start.
you should identify the ENTITIES needed in the applicaiton,
then define each entitie's ATTRIBUTES, and METHODS.
if there are repeated ones, you could now re-define your entities
in a way that inheritance could be done, to avoid redundancy.
:D.

I have a three step process, this is one that I have gone through successfully myself. Later I met an ex-teacher turned programmer (now very experienced) who explained to me exactly why this method worked so well, there's some psychology involved but it's essentially all about maintaining control and confidence as you learn. Here it is:
Learn what test driven development (TDD) is. You can comfortably do this with procedural code so you don't need to start working with objects yet if you don't want to. The second step depends on this.
Pick up a copy of Refactoring: Improving the Design of Existing Code by Martin Fowler. It's essentially a catalogue of little changes that you can make to existing code. You can't refactor properly without tests though. What this allows you to do is to mess with the code without worrying that everything will break. Tests and refactoring take away the paranoia and sense that you don't know what will happen, which is incredibly liberating. You left to basically play around. As you get more confident with that start exploring mocks for testing the interactions between objects.
Now comes the big that most people, mistakenly start with, it's good stuff but it should really come third. At this point you can should reading about design patterns, code smells (that's a good one to Google) and object oriented design principles. Also learn about user stories or use cases as these give you good initial candidate classes when writing new applications, which is a good solution to the "where do I start?" problem when writing apps.
And that's it! Proven goodness! Let me know how it goes.

My eureka moment for understanding object-oriented design was when I read Eric Evans' book "Domain-Driven Design: Tackling Complexity in the Heart of Software". Or the "Domain Driven Design Quickly" mini-book (which is available online as a free PDF) if you are cheap or impatient. :)
Any time you have a "Manager" class or any static singleton instances, you are probably building a procedural design.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas