A question I've been struggling a lot with lately is how, in my opinion, Inversion of Control
breaks Encapsulation and can easily lead to side effects in a program. However, at the same time, some of the
big advatages of IoC is loose coupling/modularity as well as Test Driven Design making Unit testing a class much easier (I think TDD is really pushing IoC in the industry).
Here is my argument againt IoC.
If the injected types are Immutable and Pure then IoC is acceptable, for example primitive types. However, if they are impure
and can modify the state of the program or hold their own state then side effects can easily
occur.
Take the following example C#/Pseudo:
public class FileSearcher: IFileSearcher
{
private readonly string searchPath;
public void SetSearchPath(string path)
{
searchPath = path;
}
public List<string> FindFiles(string searchPattern)
{
//...Search for files with searchPattern starting at searchPath
}
}
public class PlayListViewer
{
public PlayListViewer(string playlistName, IFileSearcher searcher)
{
searcher.SetSearchPath($"playlists/{playlistName}")
}
public List<string> FindSongNames()
{
return searcher.FindFiles(
"*.mp3|*.wav|*.flac").Select(f => Path.GetFileName(f))
}
//....other methods
}
public class Program
{
public static void Main()
{
var searcher = FileSearcher();
var viewer = PlayListViewer("Hits 2021", searcher);
searcher.SetSearchPath("C:/Users") //Messes up search path
var pictures = searcher.FindFiles("*.jpg") //Using searcher for something else
viewer
.FindSongNames()
.ForEach(s => Console.WriteLine(s)) //WRONG SONGS
}
}
In the (very uncreative) example above, The PlaylistViewer has a method for finding songs within a playlist. It
attempts to set the correct search path for the playlist on the injected IFileSearcher, but the User of the class
overwrote the path. Now when they try to find the songs in the playlist, the results are incorrect.
The Users of a class do not always know the implementation of the class they're using and don't know the side
effects they're causing by mutating the objects they passed in.
Some other simple examples of this:
The Date Class in Java is not immutable and has a setDate (deprecated now) method. The following could occur:
date = new Date(2021, 10, 1)
a = new A(date)
a.SomethingInteresting() //Adds 1 year to the Date using setDate
b = new B(date) //No longer the correct date
I/O abstractions such as streams:
audioInput = new MemoryStream()
gainStage = new DSPGain(audioInput)
audioInput.write(....)
audioInput.close()
gainStage.run() //Error because memory stream is already closed
etc...
Other issues can come up too if the Object gets passed to multiple classes that use it across different threads concurrently. In these cases
a User might not know/realize that class X internally is launching/processing on a different thread.
I think the simple, and functional, answer would be to only write pure functions and immutable classes but that isn't always practical in the real world.
So when should IoC really be used? Maybe only when the injected types are immutable and pure and anything else should be composed and encapsulated? If that's the answer, then what does that mean for TDD?
First, Inversion of Control is not the same as Dependency Injection. DI is just one implementation of IoC. This question makes more sense if we limit it to just DI.
Second, Dependency Injection is orthogonal to Test Driven Development. DI can make writing unit tests easier, which may encourage you to write more unit tests; but that does not necessitate TDD. You can certainly use DI without TDD, and I suspect that's the way the vast majority of developers use it. TDD is not a widespread practice.
Conversely, practicing TDD may encourage you to implement DI; but that is far from a requirement. Don't confuse statements like, "TDD and DI work well together," with "TDD and DI require each other." They can be used together or separately.
Finally, if you want to use your DI container as a repository of global variables, you certainly can. This approach of storing mutable state and injecting it across your application brings the same caveats and pitfalls as sharing mutable state anywhere else.
That should be the main takeaway from this question: not the downside of DI or TDD, but the downside of mutable state in general. You don't need DI to run afoul of mutable state. Trouble with mutable state is virtually guaranteed in the practice of imperative programming, which is by far the most common programming paradigm.
Consider that the functional programmers might really be onto something with their declarative approach.
Related
Background
I'm in the process of reworking and refactoring a huge codebase which was written with neither testability nor maintainability in mind. There is a lot of global/static state going on. A function needs a database connection, so it just conjures one up using a global static method: $conn = DatabaseManager::getConnection($connName);. Or it wants to load a file, so it does it using $fileContents = file_get_contents($hardCodedFilename);.
Much of this code does not have proper tests and has only ever been tested directly in production. So the first thing I am intending on doing is write unit tests, to ensure the functionality is correct after refactoring. Now sadly code like the examples above is barely unit testable, because none of the external dependencies (database connections, file handles, ...) can be properly mocked.
Abstraction
To work around this I have created very thin wrappers around for example the system functions, that can be used in places where non-mockable function calls were used before. (I'm giving these examples in PHP, but I assume they are applicable for any other OOP language as well. Also this is a highly shortened example, in reality I am dealing with much larger classes.)
interface Time {
/**
* Returns the current time in seconds since the epoch.
* #return int for example: 1380872620
*/
public function current();
}
class SystemTime implements Time {
public function current() {
return time();
}
}
These can be used in the code like so:
class TimeUser {
/**
* #var Time
*/
private $time;
/**
* Prints out the current time.
*/
public function tellsTime() {
// before:
echo time();
// now:
echo $this->time->current();
}
}
Since the application only depends on the interface, I can replace it in a test with a mocked Time instance, which for example allows to predefine the value to return for the next call to current().
Injection
So far so basic. My actual question is how to get the proper instances into the classes that depend upon them. From my Understanding of Dependency injection, services are meant to be passed down by the application into the components that need them. Usually these services would be created in a {{main()}} method or at some other starting point and then strung along until they reach the components where they are needed.
This model likely works well when creating a new application from scratch, but for my situation it's less than ideal, since I want to move gradually to a better design. So I've come up with the following pattern, which automatically provides the old functionality while leaving me with the flexibility of substituting services.
class TimeUser {
/**
* #var Time
*/
private $time;
public function __construct(Time $time = null) {
if ($time === null) {
$time = new SystemTime();
}
$this->time = $time;
}
}
A service can be passed into the constructor, allowing for mocking of the service in a test, yet during "regular" operation, the class knows how to create its own collaborators, providing a default functionality, identical to what was needed before.
Problem
I've been told that this approach is unclean and subverts the idea of dependency injection. I do understand that the true way would be to pass down dependencies, like outlined above, but I don't see anything wrong with this simpler approach. Keep in mind also that this is a huge system, where potentially hundreds of services would need to be created up front (Service locator would be an alternative, but for now I am trying to go this other direction).
Can someone shed some light onto this issue and provide some insight into what would be a better way to achieve a refactoring in my case?
I think You've made first good step.
Last year I was on DutchPHP and there was a lecture about refactoring, lecturer described 3 major steps of extracting responsibilyty froma god class:
Extract code to private method (it should be simple copy paste since
$this is the same)
Extract code to separate class and pull
dependency
Push dependency
I think you are somewhere between 1st and 2nd step. You have a backdoor for unit tests.
Next thing according to above algorithm is to create some static factory (lecturer named it ApplicationFactory) which will be used instead of creation of instance in TimeUser.
ApplicationFactory is some kind of ServiceLocator pattern. This way you will inverse dependency (according to SOLID principle).
If you are happy with that you should remove passing Time instance into constructor and use ServiceLocator only (without backdoor for unit tests, You should stub service locator)
If you are not, then You have to find all places where TimeUser is being instantiated and inject Time implemenation:
new TimeUser(ApplicationFactory::getTime());
After some time yours ApplicationFactory will become very big. Then You have to made a decision:
Split it into smaller factories
Use some dependency injection container (Symfony DI, AurynDI or
something like that)
Currently my team is doing something similar. We are extracting responsibilities to seperate classes and inject them. We have an ApplicationFactory but we use it as service locator at as hight level as possible so classes bellow gets all dependencies injected and don't know anything about ApplicationFactory. Our application factory is big and now we are preparing to replace it with SymfonyDI.
You asked for a good mechanism to do this.
You've described some stages you might force the program to go through to accomplish this, but you are still apparantly planning to do this by hand at apparantly a very high cost.
If you really want to get this done on a huge code base, you might consider automating the steps using a program transformation engine: http://en.wikipedia.org/wiki/Program_transformation
Such a tool can let you write explicit rules for modifying code. Done right, this can make code changes reliably. That doesn't minimize your need for testing, but can let you spend more time writing tests and less time hand-changing the code (erroneously).
I'm not so sure the title is a good match for this question I want to put on the table.
I'm planning to create a web MVC framework as my graduation dissertation and in a previous conversation with my advisor trying to define some achivements, he convinced me that I should choose a modular design in this project.
I already had some things developed by then and stopped for a while to analyze how much modular it would be and I couldn't really do it because I don't know the real meaning of "modular".
Some things are not very cleary for me, like for example, just referencing another module blows up the modularity of my system?
Let's say I have a Database Access module and it OPTIONALY can use a Cache module for storing results of complex queries. As anyone can see, I at least will have a naming dependency for the cache module.
In my conception of "modular design", I can distribute each component separately and make it interact with others developed by other people. In this case I showed, if someone wants to use my Database Access module, they will have to take the Cache as well, even if he will not use it, just for referencing/naming purposes.
And so, I was wondering if this is really a modular design yet.
I came up with an alternative that is something like creating each component singly, without don't even knowing about the existance of other components that are not absolutely required for its functioning. To extend functionalities, I could create some structure based on Decorators and Adapters.
To clarify things a little bit, here is an example (in PHP):
Before
interface Cache {
public function isValid();
public function setValue();
public function getValue();
}
interface CacheManager {
public function get($name);
public function put($name, $value);
}
// Some concrete implementations...
interface DbAccessInterface {
public doComplexOperation();
}
class DbAccess implements DbAccessInterface {
private $cacheManager;
public function __construct(..., CacheManager $cacheManager = null) {
// ...
$this->cacheManager = $cacheManager;
}
public function doComplexOperation() {
if ($this->cacheManager !== null) {
// return from cache if valid
}
// complex operation
}
}
After
interface Cache {
public function isValid();
public function setValue();
public function getValue();
}
interface CacheManager {
public function get($name);
public function put($name, $value);
}
// Some concrete implementations...
interface DbAccessInterface {
public function doComplexOperation();
}
class DbAccess implements DbAccessInterface {
public function __construct(...) {
// ...
}
public function doComplexQuery() {
// complex operation
}
}
// And now the integration module
class CachedDbAcess implements DbAccessInterface {
private $dbAccess;
private $cacheManager;
public function __construct(DbAccessInterface $dbAccess, CacheManager $cacheManager) {
$this->dbAccess = $dbAccess;
$this->cacheManager = $cacheManager;
}
public function doComplexOperation() {
$cache = $this->cacheManager->get("Foo")
if($cache->isValid()) {
return $cache->getValue();
}
// Do complex operation...
}
}
Now my question is:
Is this the best solution? I should do this for all the modules that do not have as a requirement work together, but can be more efficient doing so?
Anyone would do it in a different way?
I have some more further questions involving this, but I don't know if this is an acceptable question for stackoverflow.
P.S.: English is not my first language, maybe some parts can get a little bit confuse
Some resources (not theoretical):
Nuclex Plugin Architecture
Python Plugin Application
C++ Plugin Architecture (Use NoScript on that side, they have some weird login policies)
Other SO threads (design pattern for plugins in php)
Django Middleware concept
Just referencing another module blows up the modularity of my system?
Not necessarily. It's a dependency. Having a dependencies is perfectly normal. Without dependencies modules can't interact with each other (unless you're doing such interaction indirectly which in general is a bad practice since it hides dependencies and complicates the code). Modular desing implies managing of dependencies, not removing them.
One tool - is using interfaces. Referencing module via interface makes a so called soft dependency. Such module can accept any implementation of an interface as a dependency so it is more independant and as a result - more maintainable.
The other tool - designing modules (and their interfaces) that have only single responcibility. This also makes them more granular, independant and maintainable.
But there is a line which you should not cross - blindly applying these tools may leed to a too modular and too generic desing. Making things too granular makes the whole system more complex. You should not solve universe problems, making generic modules, that all developers can use (unless it is your goal). First of all your system should solve your domain tasks and make things generic enough, but not more than that.
I came up with an alternative that is something like creating each component singly, without don't even knowing about the existance of other components that are not absolutely required for its functioning
It is great if you came up with this idea by yourself. The statement itself, is a key to modular programming.
Plugin architecture is the best in terms of extensibility, but imho it is hard to maintenance especially in intra application. And depending the complexity of plugin architecture, it can make your code more complex by adding plugin logics, etc.
Thus, for intra modular design, I choose the N-Tier, interface based architecture. Basically, the architecture relays on those tiers:
Domain / Entity
Interface [Depend on 1]
Services [Depend on 1 and 2]
Repository / DAL [Depend on 1 and 2]
Presentation Layer [Depend on 1,2,3,4]
Unfortunately, I don't think this is achieveable neatly in php projects as it need separated project / dll references in each tier. However, following the architecture can help to modularize the application.
For each modules, we need to do interface-based design. It can help to enhance the modularity of your code, because you can change the implementation later, but still keep the consumer the same.
I have provided an answer similiar to this interface-based design, at this stackoverflow question.
Lastly but not least, if you want to make your application modular to the UI, you can do Service Oriented Architecture. This is simply make your application as bunch of services, and then make the UI to consume the service. This design can help to separate your UI with your logic. You can later use different UI such as desktop app, but still use the same logic. Unfortunately, I don't have any reliable source for SOA.
EDIT:
I misunderstood the question. This is my point of view about modular framework. Unfortunately, I don't know much about Zend so I will give examples in C#:
It consist of modules, from the smallest to larger modules. Example in C# is you can using the Windows Form (larger) at your application, and also the Graphic (smaller) class to draw custom shapes in the screen.
It is extensible, or replaceable without making change to base class. In C# you can assign FormLoad event (extensible) to the Form class, inherit the Form or List class (extensible) or overridding form draw method to create a custom window graphic (replaceable).
(optional) it is easy to use. In normal DI interface design, we usually inject smaller modules into a larger (high level) module. This will require an IOC container. Refer to my question for detail.
Easy to configure, and does not involve any magical logic such as Service Locator Pattern. Search Service Locator is an Anti Pattern in google.
I don't know much about Zend, however I guess that the modularity in Zend can means that it can be extended without changing the core (replacing the code) inside framework.
If you said that:
if someone wants to use my Database Access module, they will have to take the Cache as well, even if he will not use it, just for referencing/naming purposes.
Then it is not modular. It is integrated, means that your Database Access module will not work without Cache. In reference of C# components, it choose to provide List<T> and BindingList<T> to provide different functionality. In your case, imho it is better to provide CachedDataAccess and DataAccess.
Code examples are C# but this is a general OO question.
I know according to OO rules, class coupling should be minimised and members should be kept private wherever possible, etc.
Consider this example:
You are writing an esoteric program which has some sort of data set (I'm not talking about System.Data.DataSet) which is used in literally every facet of the program. In fact, the program basically exists just to load, display, manipulate, and save the data set. Furthermore, there can only ever be one data set loaded at any time, and it is loaded when the program opens.
If we follow OO rules strictly, we would have
public void ShowSomeGraphs(IData data)
{
// do stuff with data which implements IData
}
however we could potentially store a public static Data member in Program, for example.
public void ShowSomeGraphs()
{
// do stuff with Program.Data
}
On one hand, we have traded a slightly shorter function signature for vastly increased class coupling. On the other hand, we are no longer passing a Data parameter to practically every function, everywhere.
The right answer probably is: Avoid class coupling wherever possible. The local Data variables are just pointers so the memory overhead is negligible, and because the classes are decoupled they can be used elsewhere at a later date.
Though realistically speaking, the structure of the Data class will likely be phenomenally different in a different application, so it's not like you can just pull a class from this program and drop it in somewhere else without any tweaks. The extra time and effort required to write the classes in such a way that they can just be dropped in might be difficult to justify to a stakeholder.
I'm working on this sort of program now, and I have used the OO-canon approach: Data parameters are passed around where needed I have minimised class coupling with an IData interface to generalise the data set for future code re-use. Given the application, I'm almost certain this code won't ever be re-used. Without these extra interfaces and abstraction, the program would have worked exactly the same as far as the end user is concerned, but would have represented significantly less headaches and development time for me.
What do you think about this? Do you think it's justifiable to spend all the extra time to write the interfaces and generalisations to ensure classes are decoupled where possible, especially when you can't see the classes being use elsewhere later?
Don't agonise over it. Seriously.
Software paradigms/patterns are there to help us and not to be followed dogmatically.
You make it clear in your question that you consider the loose coupling overkill, and you can justify why. Therefore, don't use it.
How about using the singleton pattern to provide a method or read-only property to get the IData interface? This way you're only coupled to a very thin singleton class and all your interactions with the data set are done through the IData interface.
(I would definitely avoid the tight coupling. Even if you don't plan to do much with this app chances are that you will run into an issue during development which will force you to touch significantly more code than if you accessed the data via an interface.)
Code sample of the singleton solution proposed above:
using System;
public class MyClass {
public static void Main() {
// simple usage:
Console.WriteLine("From Main: " + Singleton.Instance.IMyData.GetData());
// client code from another type:
new ClientObj().DoWork();
Console.ReadKey();
}
}
public sealed class Singleton {
// standard singleton stuff:
private static readonly Singleton _instance = new Singleton();
private Singleton(){}
public static Singleton Instance {get { return _instance; }}
// data interface stuff:
private MyData _myData = new MyData();
public IData IMyData {get { return _myData; }}
}
// the interface:
public interface IData {
string GetData();
}
// concrete implementation of the data class
public class MyData : IData {
public string GetData() {return "Hello World!";}
}
// example of a type using the singleton and the IData interface
public class ClientObj {
public void DoWork() {
IData data = Singleton.Instance.IMyData;
string str = data.GetData();
Console.WriteLine("From other obj: " + str);
}
}
Some caveats: The code sample above is completely stripped down to show the concept of a singleton and shared interface. There is no thread safety implemented, there is no initialization of the data object etc.
Well, there's one big assumption in your text: There will always only be one data set in the program. Are you sure that condition will hold for all time? There was a time where word processors could only hold one text at a time. Today it's standard to be able to have several files open at once. I'd also not be surprised if the first web browsers could only open one web page at a time. Today nobody would use a web browser which could not have several pages open at the same time. I think the sort of object where you can say there will be for certain only one of it in the program, ever, is quite rare. Indeed, the only thing which I would make a global object or singleton would be object factories.
On the other hand, passing the object around for each function call seems to be overkill to me, too. Therefore I would go for the middle ground: Have the objects remember that "global" object, so you only have to pass it via the constructor. This limits each single object to one Data object, but still allows you to easily have several Data objects in your program should you ever decide to.
When I first started working with object-oriented programming languages, I was taught the following rule:
When declaring a field in a class, don't initialize it yet. Do that in the constructor.
An example in C#:
public class Test
{
private List<String> l;
public Test()
{
l = new List<String>();
}
}
But when someone recently asked me why to do that, I couldn't come up with a reason.
I'm not really familiar with the internal workings of C# (or other programming languages, for that matter, as I believe this can be done in all OO languages).
So why is this done? Is it security? Properties?
If you have multiple constructors, you might want to initialize a field to different values
When you initialize the field in the constructor, there can be no confusion over when exactly it is initialized in regard to the rest of the constructor. This may seem trivial with a single class, but not so much when you have an inheritance hierarchy with constructor code running at each level and accessing superclass fields.
The C# compiler will take any non-static member intialization that you do inline and move it into the constructor for you. In other words this:
class Test
{
Object o = new Object();
}
gets compiled to this:
class Test
{
Object o;
public Test()
{
this.o = new Object();
}
}
I am not sure how compilers for other languages handle this but as far as C# is concerned it is a matter of style and you are free to do whichever you wish. Please note that static fields are handled differently: read this article for more information on that.
One reason to do it is that it puts all of the initialization code in one place which is convenient for others reading your class. Having said this I don't really do it for two primary reasons. (1) I use TDD/Unit testing to define the behavior of my class. If you want to know what the parameterless constructor does, you should really read the tests I've built on the parameterless constructor. (2) With C# 3.0, I typically use automatic properties and inline initialization with a parameterless constructor to instantiate the object. This is much more flexible and it puts the definition of the properties right in line where the code is being used. This would override any initialization in the constructor so I rarely put any there. Of course, this only applies to C#.
Ex. (of 2)
var foo = new Foo { Bar = "baz" };
public class Foo
{
public string Bar { get; set; }
public Foo() { }
}
sometimes the constructor has parameters that are used for initializing internal variables. For example size of arrays
I haven't heard a compelling reason to not offer both options. I suspect that the real reason has to do with simplifying the language structure from a parsing perspective. This is especially true in C-derivative languages where parsing an assignment statement requires 75% of the language syntax rules. It seems to me that allowing it and defining how it would work precisely would be nice. I agree with Michael's comment about the complexity increase as you insert inheritance and multiple constructors but just because you add a feature doesn't mean that you have to use it. I would vote to support both even though my vote doesn't really add up to much.
I always like to think of the class as a factory for objects, and the constructor as the final stop on the production line. The fields declared in the class are blueprints descirbing the object, but the blueprint won't be realised into an object before such an object is ordered tthrough a call to the constructor... Also, as someone pointed out, doing all your initialisations in your constructor will improve readability, as well as it wil provide for dynamicity in initialisation (it might not be a parameterless constructor you're dealing with).
Also, in some languages the constructor may be used for resetting an object to an original state, which is why it will then be necessary to instatiate the object in the constructor.
So, I was reading the Google testing blog, and it says that global state is bad and makes it hard to write tests. I believe it--my code is difficult to test right now. So how do I avoid global state?
The biggest things I use global state (as I understand it) for is managing key pieces of information between our development, acceptance, and production environments. For example, I have a static class named "Globals" with a static member called "DBConnectionString." When the application loads, it determines which connection string to load, and populates Globals.DBConnectionString. I load file paths, server names, and other information in the Globals class.
Some of my functions rely on the global variables. So, when I test my functions, I have to remember to set certain globals first or else the tests will fail. I'd like to avoid this.
Is there a good way to manage state information? (Or am I understanding global state incorrectly?)
Dependency injection is what you're looking for. Rather than have those functions go out and look for their dependencies, inject the dependencies into the functions. That is, when you call the functions pass the data they want to them. That way it's easy to put a testing framework around a class because you can simply inject mock objects where appropriate.
It's hard to avoid some global state, but the best way to do this is to use factory classes at the highest level of your application, and everything below that very top level is based on dependency injection.
Two main benefits: one, testing is a heck of a lot easier, and two, your application is much more loosely coupled. You rely on being able to program against the interface of a class rather than its implementation.
Keep in mind if your tests involve actual resources such as databases or filesystems then what you are doing are integration tests rather than unit tests. Integration tests require some preliminary setup whereas unit tests should be able to run independently.
You could look into the use of a dependency injection framework such as Castle Windsor but for simple cases you may be able to take a middle of the road approach such as:
public interface ISettingsProvider
{
string ConnectionString { get; }
}
public class TestSettings : ISettingsProvider
{
public string ConnectionString { get { return "testdatabase"; } };
}
public class DataStuff
{
private ISettingsProvider settings;
public DataStuff(ISettingsProvider settings)
{
this.settings = settings;
}
public void DoSomething()
{
// use settings.ConnectionString
}
}
In reality you would most likely read from config files in your implementation. If you're up for it, a full blown DI framework with swappable configurations is the way to go but I think this is at least better than using Globals.ConnectionString.
Great first question.
The short answer: make sure your application is a function from ALL its inputs (including implicit ones) to its outputs.
The problem you're describing doesn't seem like global state. At least not mutable state. Rather, what you're describing seems like what is often referred to as "The Configuration Problem", and it has a number of solutions. If you're using Java, you may want to look into light-weight injection frameworks like Guice. In Scala, this is usually solved with implicits. In some languages, you will be able to load another program to configure your program at runtime. This is how we used to configure servers written in Smalltalk, and I use a window manager written in Haskell called Xmonad whose configuration file is just another Haskell program.
An example of dependency injection in an MVC setting, here goes:
index.php
$container = new Container();
include_file('container.php');
container.php
container.add("database.driver", "mysql");
container.add("database.name","app");
...
$container.add(new Database($container->get('database.driver', "database.name")), 'database');
$container.add(new Dao($container->get('database')), 'dao');
$container.add(new Service($container->get('dao')));
$container.add(new Controller($container->get('service')), 'controller');
$container.add(new FrontController(),'frontController');
index.php continues here:
$frontController = $container->get('frontController');
$controllerClass = $frontController->getController($_SERVER['request_uri']);
$controllerAction = $frontController->getAction($_SERVER['request_uri']);
$controller = $container->get('controller');
$controller->$action();
And there you have it, the controller depends on a service layer object which depends on
a dao(data access object) object which depends on a database object with depends on the
database driver, name etc