How to avoid violating SRP when transforming data

How to avoid violating SRP when transforming data - oop

How can I avoid violating the "single responsibility principle" when writing a class which should transform data from format A to format B?
There are exactly two reasons for such a class to change, because specifications of both, format A and B can change.

So, I am new here, and post this with the caveat that I am stretching in proposing my answer.
It seems to me that One can only take concepts like the "Single Responsibility Priciple" so far. To me, the single responsibility of your transform class is to manage the translation of data from one format specification to another. Some thoughts:
In the strictest of terms, if one of the formats changes, you theoretically would still need the previous version of the format converter for converting legacy dat (backward compatibility). You never know when someone might not get the memo about the change in formats. Or you might run into a batch of data in format A v1 in the basement somewhere. Therefore, the single responsibility of your class would remain the conversion of data from format A1.0 to format b1.0.
If one of the specifications DOES change, you now have to create a new VERSION of your class, right? Let's say someone modifies the spec for format A. Now you need a class which manages the transformation of data from format A1.1 to B1.0. You have created a new class with a single responsibility.
While in the scope of your project, you may not consider the need for backward compatibility a requirement, in terms of the SRP concept, my understanding is that a change in one or both of the format specs requires the definition of a new class, and does not, in the strictest terms of the acedemic theory, imply more than one responsibility.
Lastly, if you think of the mapping of data from one format to another as the single responsibility of the class, then a change to EITHER spec still only necessitates a change to the single job of the class.
A final example by illustration. Assume the responsibility of my class is to transform a particular shade of red into a particular shade of pink. Then one day the cheif designer decides he wants a brighter pink as the output, one side of my spec has changed, but the responsibility of my class has not. The next day, it is decided at the very highest corporate level that the new red standard is more like maroon. Now my input spec has changes, but the responsibility of my class has not. I might decide the create a new class, and retain version 1.0 for holdovers, or I might just update the existing version. In either case, the class still has a single responsibility; mapping the red specification to the pink specification.

Related

SOLID - One class per function? [duplicate]

I am quite confused with the Single Responsibility Principle. The Principle states that there should only be one reason for the class to change.
The problem which I am facing is, any change to a method or any logic change in doing things would change the class. For example, consider the following class:
class Person{
public void eat(){ };
public void walk(){ };
public void breathe(){ };
public void run(){ };
public void driveCar(Car car){ };
}
Uncle Bob describes it as there should ONLY be a single person/Actor responsible for the change. I have the following two questions:
For the above class who is the actor/Person who can be responsible for change?
Wouldn't any change in the logic of eating, breathing or walking change the class Person? So doesn't that mean that every method is a reason to change as it's logic to doing things might change?

What is a reason to change
For the above class who is the actor/Person who can be responsible for the change?
An Actor is a user (including clients, stakeholders, developers, organizations) or an external system. We can argue if people are systems, yet that is not here nor there.
See also: Use case.
Wouldn't any change in the logic of eating, breathing or walking change the class Person? So doesn't that mean that every method is a reason to change as its logic to doing things might change?
No, a method is not a reason to change. A method is something that can change... but why would it? What would trigger the developer to change it?
Part of the single responsibility principle is that code should interact at most with one external system. Remember that not all actors are external systems, however, some are. I think most people will find this part of the SRP easy to understand because interaction with an external system is something we can see in the code.
However, that is not enough. For example, if your code has to compute taxes, you can hardcode the tax rate in your code. That way, it is not interacting with any external system (it is just using a constant). However, one tax reform later, the government has been revealed as a reason to change your code.
Something you should be able to do is interchange external systems (perhaps with some additional coding effort). For example, changing from one database engine to another. However, we do not want one of these changes to translate into a total rewrite of the code. Changes should not propagate, and making a change should not break something else. To ensure that, we want all the code that deals with the database engine (in this example) to be isolated.
Things that change for the same reasons should be grouped together, things that change for different reasons should be separated.
-- Robert C Martin
We can do something similar with the government example above. We probably do not want the software reading the minute of the congress, instead, we can have it reading a configuration file. Now the external system is the file system, and there would be code to interact with it, and that code should not interact with anything else.
How do we identify those reasons to change?
Your code is defined by a set of requirements. Some are functional, others not. If any of those requirements change, your code has to change. A reason to change requirements is a reason to change your code.
Note: It is possible that you do not have all your requirement documented, however, an undocumented requirement is still a requirement.
Then, you need to know from where do those requirements come from. Who or what could change them? Those are your reasons for change. It could be a change in the politics of the company, it could be a feature we are adding, it could be a new law, it could be that we are migrating to a different database engine, or different operating system, translating to another language, adapting to another country, etc.
Some of those things are externals systems with which your code interacts (e.g. the database engine), some are not (the politics of the company).
What to do with responsibilities
You want to isolate them. So you will have code that interacts with the database, and nothing else. And you will have code that implements business rules, and nothing else. And so on.
Realize that even though the implementation of each part of your code will depend on something external, their interface does not have to. Thus, define interfaces and inject dependencies, so that you can change the implementation of each part without having to change the others… that is, the implementation of parts of your code should not be a reason to change the implementation of other parts of your code.
Note: No part of your code should have multiple responsibilities. Have parts of your code deal with each responsibility, and have part of your code with the responsibility of bringing other parts together. Similarly, if a part of your code has no responsibility… there is no reason to keep it. Thus, every part of your code should have exactly one responsibility.
For your code, ask yourself, what are the requirements of the Person class. are they complete? From where do they come from? Why would they change?
Recommended viewing
For a more authoritative explanation of the single responsibility principle, see Robert C Martin - The Single Responsibility Principle (51 minutes, 8 seconds, English language) at the Norwegian Developers Conference, 2015.

Interesting question. The quote from "Uncle Bob" Martin is:
A class should have one, and only one, reason to change.
One could interpret this as saying that your Person class has five reasons to change: you might want to change the eat method, or change the walk method, or the breathe method, or the run method, or the driveTheCar method. But this is too narrow, and doesn't really capture what Martin meant by a "reason to change".
A reason to change a class means a human programmer's motivation for changing it. You would not change the eat method simply because you were motivated to change the eat method; you would change it to achieve some goal regarding a desired behaviour of your program.
If the Person class models a person for some sort of simulation, then your motivation for changing it would be that you want "to change how people's actions are modelled in the simulation". Every change you make to the class would be motivated by that reason, whether you changed one method or more than one; so the Person class has only one "reason" to change, fulfilling the SRP.
If the Person class had some other methods such as for drawing the person on the screen, then you might also want "to change the graphical appearance of your simulated people". This would be a completely different motivation than the motivation to change the way your simulation models people's actions, so the class would have two responsibilities, violating SRP.

Method requires specific subtype but collection is of base abstract type. What is wrong?

Recently I have fallen in a situation like this. I'm generalizing the problem because I think it relates more to the structural design than the specific problem.
General problem
There is a hierarchy of classes: an abstract base class Base and some concretions D1, D2, D3 that inherit from it. The class A contains an object's collection of type Base. A requires a computation from some service-class B but B.process() method accepts only a collection of type D1. Let's say that is important because if the input collection contains any other type the value returned is just wrong.
A have an interface that allows clients to add elements to the internal collection, which is not exposed in any other way. The classes in the hierarchy can be constructed for the same clients and pass the new values to A; A have not enough context to construct them itself.
Attempts, questions and thoughts
The major concern for me was the need to determine at runtime the type of each element in the A collection, so can filter the right ones and pass to B.process(). Even if it is possible (it is in my particular problem, more later on) it just seems wrong! I think the object who contains references to the abstract base class shouldn't have to know the concrete instances it holds.
I try to:
Change the parameter type to B.process(c: Base[]) so A doesn't have to downcast the type, but it doesn't solve anything: A still needs to filter the elements or the computation will be wrong.
Pass the complete collection Base[] to B.process() but just defer the problem of selection/downcasting to B.
Put a process() method in Base so D1 can override the behavior (well known polymorphism). The problem here is that a process() returning a SomeValue type just have sense for D1.
Separate the interface that add elements so a more specific A.addD1Element(e: D1) method could allow put D1 objects in a different collection and pass that to B. It should work but also looks... don't know, weird. If method overload based on parameter type is possible at least the process won't be so cumbersome for clients of the class.
Just separate the D1 class of the hierarchy. This is a more aggressive variation of the previous one. The issue is that D1 seems related to the whole hierarchy except for the specific requirements of B.
Those were some of my thoughts on the problem.
For instance, the language used have support to check the type of an object at runtime (instanceof) and it is easy to filter the collection based on that check. But as I say my question is more related to the paradigm. What about a language, say for instance C++, where is less handy to make a check like that?
So what could be a solution to this kind of problem? What kind of refactoring or design pattern could be applied so the problem is easy to treat with or simply fades away?
This question looks related, but I believe this is more general (although I provide a more specific context). The most upvoted answer suggest to split in different collections. This is also a think i'm considering, but that forces to change A implementation every time a new type is added.
Context (problem in action)
I'm asking in a general way because it really intrigues me on that way, but I know most of the time a design can be analyzed only with the context of the particular problem it tries to solve.
The problem at hand is similar to this:
A is a class (some kind of entity, like a DDD entity) that models a sort of agreement or debt a customer incurs for a service. It has different costs including a monthly pay. Base and related classes are Payments of different types. They share a lot in common, although most of it is data (date, amount, interests, etc); but there is at least one type of payment that have different, additional information: the monthly payment (D1). Those payments need to be analyzed carefully so a different class (B) is responsible for that, using more contextual information and all the payments of that type at once. The service needs the additional data that is specific to those payments so cannot receive an abstract Payment type (at least not in that design). Other payments doesn't have the specific information MonthlyPayment does and so they cannot generates the values that business requires and B is generating (doesn't have sense in other payment types).
All payments are stored in the same collection so other methods of the class can process all payments in a generic way.
This is mostly the context. I think the design is not the best, but I fail to see a better one.
Maybe separating only MonthlyPayment (D1) in a different collection as described earlier? But it is not the only payment that requires additional processing (it is the most complex, though), so I could end with different collections for every payment type and no hierarchy at all. Right now there are four payments types and two of them requires additional, specific analysis, but more types can be added later and the issue of need to modify the implementation every time a new type is added persists.
Is this, more discrete approach of different collections by type, a better one here? The abstract base class Payment can still be used for payments that can be manipulated trough the common interface. Also I can use a layer super type or something like that to allow reutilization of common functionality (the language allows a kind of mixing as well) and stop using the base class as root from a hierarchy.
Uf. I am sorry for the length of the text. I hope it is at least readable and clear. Thank you very much in advance.

Class diagram - multiple classes uses same class

I am designing a class diagram for an assignment. In this design, I use a separate class called Currency, to define currency values and their functionality. there are at least four other classes have to use this Currency class.
How can I show it in the class diagram ? I mean, do I need to draw relationships (connecting lines) from the Currency class to all the others ?
Is there a better way ?
What am I doing wrong here ?

There is nothing wrong and a reusability of a class is valuable. Actually that's a standard situation.
If you use this class in another class as an attribute you have two options to depict that:
draw an association relationship (line) from the class using to the class that is used.
put the attribute in a proper compartment of a class that is using and as a type of an attribute (after a colon) put the name of the used class.
The benefit of the first approach is that you immediately see the dependency between the classes.
If you use a class but not directly as an attribute type you use other relationship types that suit best to the situation you want to describe.
As I imagine one of your concerns is that you'll have a lot of relationships pointing to your class (in your case Currency). Don't worry about that. You don't have to put everything in a single diagram. Put a full specification of your class on one diagram with those relationships where it uses something else and then put only the class box with a name (without any compartment) on diagrams defining those elements that use your class. It will make your model readable. And with a support of some CASE tool you will be able to see all relationship and dependencies of this class anyway. By the way that's how the UML specification is written. Look for example how Namespace is used in the diagrams there (and many others as well).
Of course I'm not suggesting creating one diagram per one element to define it. No. Collect them in logical Packages (hey - that's exactly what Packages are for!) and make a class diagram per Package. If the Package becomes too large - you might need to split it into smaller subpackages.
For Currency your Package would be probably something like Utils. It can also contain other elements like Date, Address etc. Note - these are typical examples, probably every analyst/designer/programmer sooner or later has to cope with those elements. If you build them well, you'll be really able to reuse them in future applications as well.
One last thought. While you build "package based" Class diagram you might also need a diagram that shows just specific parts coming from several Packages to clarify some bit of your system/business/whatsoever. This is also absolutely fine. Again a benefit of CASE tool here is that it keeps consistency in your model.

How do you determine how coarse or fine-grained a 'responsibility' should be when using the single responsibility principle?

In the SRP, a 'responsibility' is usually described as 'a reason to change', so that each class (or object?) should have only one reason someone should have to go in there and change it.
But if you take this to the extreme fine-grain you could say that an object adding two numbers together is a responsibility and a possible reason to change. Therefore the object should contain no other logic, because it would produce another reason for change.
I'm curious if there is anyone out there that has any strategies for 'scoping', the single-responsibility principle that's slightly less objective?

it comes down to the context of what you are modeling. I've done some extensive writing and presenting on the SOLID principles and I specifically address your question in my discussions of Single Responsibility.
The following first appeared in the Jan/Feb 2010 issue of Code Magazine, and is available online at "S.O.L.I.D. Software Development, One Step at a Time"
The Single Responsibility Principle
says that a class should have one, and
only one, reason to change.
This may seem counter-intuitive at
first. Wouldn’t it be easier to say
that a class should only have one
reason to exist? Actually, no-one
reason to exist could very easily be
taken to an extreme that would cause
more harm than good. If you take it to
that extreme and build classes that
have one reason to exist, you may end
up with only one method per class.
This would cause a large sprawl of
classes for even the most simple of
processes, causing the system to be
difficult to understand and difficult
to change.
The reason that a class should have
one reason to change, instead of one
reason to exist, is the business
context in which you are building the
system. Even if two concepts are
logically different, the business
context in which they are needed may
necessitate them becoming one and the
same. The key point of deciding when a
class should change is not based on a
purely logical separation of concepts,
but rather the business’s perception
of the concept. When the business
perception and context has changed,
then you have a reason to change the
class. To understand what
responsibilities a single class should
have, you need to first understand
what concept should be encapsulated by
that class and where you expect the
implementation details of that concept
to change.
Consider an engine in a car, for
example. Do you care about the inner
working of the engine? Do you care
that you have a specific size of
piston, camshaft, fuel injector, etc?
Or, do you only care that the engine
operates as expected when you get in
the car? The answer, of course,
depends entirely on the context in
which you need to use the engine.
If you are a mechanic working in an
auto shop, you probably care about the
inner workings of the engine. You need
to know the specific model, the
various part sizes, and other
specifications of the engine. If you
don’t have this information available,
you likely cannot service the engine
appropriately. However, if you are an
average everyday person that only
needs transportation from point A to
point B, you will likely not need that
level of information. The notion of
the individual pistons, spark plugs,
pulleys, belts, etc., is almost
meaningless to you. You only care that
the car you are driving has an engine
and that it performs correctly.
The engine example drives straight to
the heart of the Single Responsibility
Principle. The contexts of driving the
car vs. servicing the engine provide
two different notions of what should
and should not be a single concept-a
reason for change. In the context of
servicing the engine, every individual
part needs to be separate. You need to
code them as single classes and ensure
they are all up to their individual
specifications. In the context of
driving a car, though, the engine is a
single concept that does not need to
be broken down any further. You would
likely have a single class called
Engine, in this case. In either case,
the context has determined what the
appropriate separation of
responsibilities is.

I tend to think in term of "velocity of change" of the business requirements rather than "reason to change" .
The question is indeed how likely stuffs will change together, not whether they could change or not.
The difference is subtle, but helps me. Let's consider the example on wikipedia about the reporting engine:
if the likelihood that the content and the template of the report change at the same time is high, it can be one component because they are apparently related. (It can also be two)
but if the likelihood that the content change without the template is important, then it must be two components, because they are not related. (Would be dangerous to have one)
But I know that's a personal interpretation of the SRP.
Also, a second technique that I like is: "Describe your class in one sentence". It usually helps me to identify if there is a clear responsibility or not.

I don't see performing a task like adding two numbers together as a responsibility. Responsibilities come in different shapes and sizes but they certainly should be seen as something larger than performing a single function.
To understand this better, it is probably helpful to clearly differentiate between what a class is responsible for and what a method does. A method should "do only one thing" (e.g. add two numbers, though for most purposes '+' is a method that does that already) while a class should present a single clear "responsibility" to it's consumers. It's responsibility is at a much higher level than a method.
A class like Repository has a clear and singular responsibility. It has multiple methods like Save and Load, but a clear responsibility to provide persistence support for Person entities. A class may also co-ordinate and/or abstract the responsibilities of dependent classes, again presenting this as a single responsibility to other consuming classes.
The bottom line is if the application of SRP is leading to single-method classes who's whole purpose seems to be just to wrap the functionality of that method in a class then SRP is not being applied correctly.

A simple rule of thumb I use is that: the level or grainularity of responsibility should match the level or grainularity of the "entity" in question. Obviously the purpose of a method will always be more precise than that of a class, or service, or component.
A good strategiy for evaluating the level of responsibility can be to use an appropriate metaphor. If you can relate what you are doing to something that exists in the real world it can help give you another view of the problem you're trying to solve - including being able to identify appropriate levels of abstraction and responsibility.

#Derick bailey: nice explanation
Some additions: It is totally acceptable that application of SRP is contextual base.
The question still remains: are there any objective ways to define if a given class violates SRP ?
Some design contexts are quite obvious ( like the car example by Derick ) but otherwise contexts in which a class's behaviour has to defined remains fuzzy many-a-times.
For such cases, it might well be helpful if the fuzzy class behaviour is analysed by splitting it's responsibilities into different classes and then measuring the impact of new behavioural and structural relations that has emanated because of the split.
As soon the split is done, the reasons to keep the splitted responsibilities or to back-merge them into single responsibility becomes obvious at once.
I have applied this approach and which has lead good results for me.
But my search to look for 'objective ways of defining a class responsibility' still continues.

I respectful don't agree when Chris Nicola's above says that "a class should presents a single clear "responsibility" to it's consumers
I think SRP is about having a good design inside the class, not class' customers.
To me it's not very clear what a responsability is, and the prove is the number of questions that this concept arises.
"single reason to change"
or
"if the description contains the word
"and" then it needs to be split"
leads to the question: where is the limit? At the end, any class with 2 public methods has 2 reasons to change, isn't it?
For me, the true SRP leads to the Facade pattern, where you have a class that simply delegades the calls to other classes
For example:
class Modem
send()
receive()
Refactors to ==>
class ModemSender
class ModelReceiver
+
class Modem
send() -> ModemSender.send()
receive() -> ModemReceiver.receive()
Opinions are wellcome

Best practice for naming subclasses

I am often in a situation where I have a concept represented by an interface or class, and then I have a series of subclasses/subinterfaces which extend it.
For example:
A generic "DoiGraphNode"
A "DoiGraphNode" representing a resource
A "DoiGraphNode" representing a Java resource
A "DoiGraphNode" with an associated path, etc., etc.
I can think of three naming conventions, and would appreciate comments on how to choose.
Option 1: Always start with the name of the concept.
Thus: DoiGraphNode, DoiGraphNodeResource, DoiGraphNodeJavaResource, DoiGraphNodeWithPath, etc.
Pro: It is very clear what I am dealing with, it is easy to see all the options I have
Con: Not very natural? Everything looks the same?
Option 2: Put the special stuff in the beginning.
Thus: DoiGraphNode, ResourceDoiGraphNode, JavaResourceDoiGraphNode, PathBaseDoiGraphNode,
etc., etc.
Pro: It is very clear when I see it in the code
Con: Finding it could be difficult, especially if I don't remember the name, lack of visual consistency
Option 3: Put the special stuff and remove some of the redundant text
Thus: DoiGraphNode, ResourceNode, JavaResourceNode, GraphNodeWithPath
Pro: Not that much to write and read
Con: Looks like cr*p, very inconsistent, may conflict with other names

Name them for what they are.
If naming them is hard or ambiguous, it's often a sign that the Class is doing too much (Single Responsibility Principle).
To avoid naming conflicts, choose your namespaces appropriately.
Personnally, I'd use 3

Use whatever you like, it's a subjective thing. The important thing is to make clear what each class represents, and the names should be such that the inheritance relationships make sense. I don't really think it's all that important to encode the relationships in the names, though; that's what documentation is for (and if your names are appropriate for the objects, people should be able to make good guesses as to what inherits from what).
For what it's worth, I usually use option 3, and from my experience looking at other people's code option 2 is probably more prevalent than option 1.

You could find some guidance in a coding standards document, for example there is the IDesign document for C# here.
Personally, I prefer option 2. This is generally the way the .NET Framework names its objects. For instance look at attribute classes. They all end in Attribute (TestMethodAttribute). The same goes for EventHandlers: OnClickEventHandler is a recommended name for an event handler that handles the Click event.
I usually try to follow this in designing my own code and interfaces. Thus an IUnitWriter produces a StringUnitWriter and a DataTableUnitWriter. This way I always know what their base class is and it reads more naturally. Self-documenting code is the end-goal for all agile developers so it seems to work well for me!

I usually name similar to option 1, especially when the classes will be used polymophically.
My reasoning is that the most important bit of information is listed first.
(I.e. the fact that the subclass is basically what the ancestor is,
with (usually) extensions 'added').
I like this option also because when sorting lists of class names,
the related classes will be listed together.
I.e. I usually name the translation unit (file name) the same as
the class name so related class files will naturally be listed together.
Similarly this is useful with incremental search.
Although I tended to use option 2 earlier in my programming career, I avoid it now because as you say it is 'inconsistant' and do not seem very orthogonal.
I often use option 3 when the subclass provides substantial extension or specification, or if the names would be rather long.
For example, my file system name classes are derived from String
but they greatly extend the String class and have a significantly different
use/meaning:
Directory_entry_name derived from String adds extensive functionality.
File_name derived from Directory_entry_name has rather specialized functions.
Directory_name derived from Directory_entry_name also has rather specialized functions.
Also along with option 1, I usually use an unqualified name for an interface class.
For example I might have a class interence chain:
Text (an interface)
Text_abstract (abstract (base) generalization class)
Text_ASCII (concrete class specific for ASCII coding)
Text_unicode (concrete class specific for unicode coding)
I rather like that the interface and the abstract base class automatically appear first in the sorted list.

Option three more logically follows from the concept of inheritance. Since you're specializing the interface or class, the name should show that it's no longer using the base implementation (if one exists).
There are a multitude of tools to see what a class inherits from, so a concise name indicating the real function of the class will go farther than trying to pack too much type information into the name.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas