While I am learning about mutation testing, I've read in Wikipedia:
The coupling effect asserts that simple faults can cascade or couple
to form other emergent faults.
Subtle and important faults are also revealed by higher-order mutants,
which further support the coupling effect
I didn't quite understand the coupling effect hypothesis. Could someone eleborate on it with some concrete examples?
The wikipedia definition of the coupling effect is nonsense. If you look through the history there were attempts to fix it around 2014/2015 but a particular user kept reverting it back.
This paper by Offut gives a clear and authorative definition.
Test data that distinguishes all programs differing from a correct one
by only simple errors is so sensitive that it also implicitly
distinguishes more complex errors.
Since examples of complex faults that are not coupled to simple faults
can be constructed, the coupling effect is probabilistic rather than
absolute.
https://cs.gmu.edu/~offutt/rsrch/papers/coupl.pdf
In other words, if a test can detect a simple fault at a location in the code it will (probably) also detect more complex faults at the same location.
This is considered to be important as the faults inserted by mutation testing are usually simple single changes to the code.
Related
When looking at arrows documentation about functional error handling one of the reason listed to avoid throwing exceptions is performance cost (referencing The hidden performance costs of instantiating Throwables)
So it is suggested to model errors/failures as an Effect.
When building an Effect to interrupt a computation the shift() method should be used (and under the hood it is also what is used to "unwrap" the effects through the bind() method).
Looking at the shift() method implementation seems that its magic is done...throwing an exception, meaning that not only exceptions are created when we want to signal an error, but also to "unwrap" any missing Option, Left instance of an Either and all the other effect types exposed by the library.
What I'm not getting is if there's some optimization done to avoid the issues with "the hidden performance costs of instantiating Throwables", or in the end they are not a real problem?
What I'm not getting is if there's some optimisation done to avoid the issues with "the hidden performance costs of instantiating Throwables", or in the end they are not a real problem?
The argumentation that this is the biggest reason for using typed errors on the JVM is probably an overstatement, there are better reason for using typed errors. Exceptions are not typed, so they are not tracked by the compiler. This is what we want to avoid if you care about type-safety, or purity. This will be better reflected in the documentation or 2.x.x.
Avoiding the performance penalty can be a benefit in hot-loops, but in general application programming it can probably be neglected.
However to answer your question on how this is dealt with in Kotlin, and Arrow:
In Kotlin cancellation of Coroutines works through CancellationException, so it's required to use this mechanism to correctly work in the Kotlin language. You can find more details in the Arrow 2.x.x Raise design document.
It's possible to remove the performance penalty of exceptions. Which is also what Arrow is doing. (Except a small regression in a single version, this was be fixed in the next release).
An example of this can also be found in the official KotlinX Coroutines which applies the same technique for disabled stack traces for JobCancellationException.
I read this question (and several others):
What's the difference between the atomic and nonatomic attributes?
I fully understand (at least I hope so :-D ) how the atomic/nonatomic specifier for properties works:
Atomic guarantees that a "read" operation won't be interrupted by a "write" operation.
Nonatomic doesn't guarantee this.
Neither atomic nor nonatomic solve race conditions, where one thread is reading and two threads are writing. There is no way to predict what result the read operation will return. This needs to be solved by additional synchronization.
Neither atomic nor nonatomic guarantee overall data integrity; one thread could set one property while another thread sets a second property in a state which is inconsistent with the state of the first property. This also needs to be solved by additional synchronization.
What make my eyebrow raise is that people are divided into two camps:
Pro atomic: It make sense to use nonatomic only for performance optimization.
And if you are not optimizing, then you should always use atomic because of point 1. This way you won't get some complete crap when reading this property in a multi-threaded application. And sure, if you care about points 2 and 3, you need to add more synchronizaion on top of it.
Against atomic: It doesn't make sense to use atomic at all.
Since atomic doesn't solve all the problems in a multi-threaded application, it doesn't make sense to use it at all, since you will need to add more synchronization code on top of it anyway. It will just make things slower.
I am leaning to the pro-atomic camp, but I want to do a sanity check that I didn't miss anything.
Lacking a very specific question (though still a good question), I'll answer with personal experience, FWIW.
In general, concurrency design is hard. With modern conveniences like GCD and ARC, the tools for implementing concurrent systems have certainly improved. However, the architecture of concurrency is still very hard.
And, generally, the hard part has nothing to do with individual properties; individual getters and setters. Concurrency is something that is implemented at a higher level.
The current state of the art is concurrency in isolation. That is, the parts of your app that are running concurrently are doing so using isolated graphs of objects that have extremely minimal connection to the rest of your application (typically, the "connections" are via callbacks that bundle up a bit of state and toss it over to some other queue, often the main queue for updating the UI).
By keeping the concurrency surface area -- the # of entry points into your code that must be concurrency safe -- to an absolute minimum, you reduce both complexity and the amount of time you'll spend debugging really weird, oft irreproducible, concurrency problems (that'll eat at your sanity).
Given all that, the value of atomic properties is pretty minimal. Sure, they can be useful along what should be the very very small set of interfaces -- of API -- that might be banged upon from multiple threads, but that is about it.
If you have objects for which the accessors are being banged on rapidly, making them atomic can be a significant performance hit, but premature optimization is the devil's fingers at play.
How does the SOLID "Interface Segregation Principle" differ from "Single Responsibility Principle"?
The Wikipedia entry for SOLID says that
ISP splits interfaces which are very large into smaller and more specific ones so that clients will only have to know about the methods that are of interest to them
However, to me that sounds like just applying the SRP to interfaces as well as classes. After all, if an interface is only responsible for just one conceptual thing, than you wouldn't be able to break it down further.
Am I missing something, or is ISP sort of redundant with SRP? If not, then what does ISP imply that SRP does not?
SRP tells us that you should only have a single responsibility in a module.
ISP tells us that you should not be forced to be confronted with more than you actually need. If you want to use a print() method from interface I, you shouldn't have to instantiate a SwimmingPool or a DriveThru class for that.
More concretely, and going straight to the point, they are different views on the same idea -- SRP is more focused on the designer-side point-of-view, while ISP is more focused on the client-side point-of-view. So you're basically right.
It all came from
The ISP was first used and formulated by Robert C. Martin when doing
some consulting for Xerox. Xerox had created a new printer system that
could perform a variety of tasks like stapling a set of printed papers
and faxing. The software for this system was created from the ground
up and performed its tasks successfully. As the software grew, making
modification became more and more difficult so that even the smallest
change would take a redeployment cycle to an hour. This was making it
near impossible to continue development. The design problem was that
one main Job class was used by almost all of the tasks. Anytime a
print job or a stapling job had to be done, a call was made to some
method in the Job class. This resulted in a huge or 'fat' class with
multitudes of methods specific to a variety of different clients.
Because of this design, a staple job would know about all the methods
of the print job, even though there was no use for them.
so
The solution suggested by Martin is what is called the Interface
Segregation Principle today. Applied to the Xerox software, a layer of
interfaces between the Job class and all of its clients was added
using the Dependency Inversion Principle. Instead of having one large
Job class, a Staple Job interface or a Print Job interface was created
that would be used by the Staple or Print classes, respectively,
calling methods of the Job class. Therefore, one interface was created
for each job, which were all implemented by the Job class.
# http://en.wikipedia.org/wiki/Interface_segregation_principle#Origin
SRP is concerned with what a module does, and how it is done, disallowing any mix of abstraction levels. Basically, as long as a component can be extensively defined with a single sentence, it will not break SRP.
On the other hand ISP is concerned with how a module should be consumed, whether it makes sense to consume just part of the module, while ignoring some aspect.
As an example of a code that keeps the spirit or SRP, but can break ISP is the Facade pattern. It has a single responsibility, "providing simplified access to a larger subsystem", but if the underlying subsystem needs to expose wildly different thinks, it does break ISP.
That said, usually when a piece of code breaks a SOLID principle, it often breaks the whole lot. Concrete examples that break a specific principle, while preserving the rest are rare in the wild.
Robert Martin tweeted the following on May 16, 2018.
ISP can be seen as similar to SRP for interfaces; but it is more than that. ISP generalizes into: “Don’t depend on more than you need.” SRP generalizes to “Gather together things that change for the same reasons and at the same times.”
Imagine a stack class with both push and pop. Imagine a client that only pushes. If that client depends upon the stack interface, it depends upon pop, which it does not need. SRP would not separate push from pop; ISP would.
SRP and ISP ultimately boils down to the same things. Implementing, either of them, needs a split of classes or interfaces.
However there are differences on other fronts.
Violation of SRP can have a far reaching effects on the entire design structure, giving rise to poor maintainability, reuse and of course low cohesion and coupling.
SRP has an impact on both the behavioral and structural components of an object structure.
Re designing on SRP violation needs a much deeper analysis, require looking at the different components of design in a holistic way.
Violation of ISP is mostly about poor readability ( and to some degree, low cohesion ). But the impact on maintenance and code re-use is far less sinister than SRP.
Moreover, refactoring code to ISP conformation, seems to be just a structural change.
See also my blog for SRP and ISP
From the point of my understanding, both principles are complementary, i.e. they need to be combined.
The ultimate consequence of violating ISP is becoming fragile, "shotgun surgury" or a "butterfly effect". A lot of code can break or require code updates because they depend onto some interface or objects which provide more than they needed. Changes become excessive.
The consequence of violating SRP is mainly decreased readability and maintentance. The lack of clear code structure may require people to search across the code base (a single responsibility is too distributed) or within a single large unit (multiple responsibilities scrammed together) to make a coherent change. In General, it is increased overhead to fully understand the concern (purpose) of some code snippet. Changes are prevented.
In that way, both principles act like a lower and upper bound for sane change management.
Examples for satisfying RSP without ISP – as provided by the other answers – express that there can be code which truly would belong together (like the stack example quote from Robert C. Martin). But it may do too much, is overengineered, etc. Maybe in very small examples, the effect is not visible, but if it grows large, it may be more comfortable to have a depending class still compile correctly after some unrelated part in the (indirect) dependency was changed. Rather than not compile anymore because unrelated things were changed.
I'm having trouble wrapping my head around state-based functionality for an invoicing system we are currently building. The system will support calculation, manual approval, printing, and archiving of invoices.
At first I thought we should use the State Pattern to model this. An invoice would be the context, which delegates printing, archiving, etc. to its currently assigned state.
But this is obviously a bad idea, because the different states (created, approved, printed, archived) should not support the same operations. E.g., you shouldn't be able to print an invoice, which hasn't been approved before. Throwing exceptions for unsupported operations would be a violation of LSP. I found a general description of this problem here.
Does anybody have an idea, how to implement this appropriately?
PS: I'm aware that this might sound like some lame-ass homework assignment, but it's not; I need this for a real world system.
You're basically creating a workflow of application states, where at each state the available operations on an invoice change. The state pattern doesn't seem appropriate, but you can still use it if you also create some operations like boolean canPrint() that would have to be used before calling print(). print() would have a contract that allows throwing exceptions if canPrint() returns false. This way, subclasses wouldn't break that contract. Another option is to have a boolean tryPrint(), that will only print if it can, and return whether it printed.
But, if the states support mostly non-overlapping operations, then maybe the state pattern is not the solution. Take a step back and look for better ways, without trying to fit a specific pattern to your problem. One way is to create a separate class with the necessary operations for each "state": like CreatedInvoice, ApprovedInvoice, etc. These classes would only have the operations they support.
Chain of Responsibility Pattern might help you here.
Adding the how part and fixing the link.
There can be Calculator, Approver, Printer and Archiver classes which are handler classes. These can have processRequest() overridden from a parent abstract class. Invoice can be a class which is passed to each handler's processRequest() method. The advantage with using the pattern here is newer handlers can be added dynamically and chain links with sequence of handlers can be changed easily.
Whether the State Pattern is really appropriate to your situation is not certain, but if it's not, Liskov is not the reason. Throwing some sort of "invalid operation in current state" exception can be defined as possible and valid in the state interface, and then subclasses doing this do not violate LSP.
The classic example used for the State Pattern in the GoF Design Patterns book is a TCPConnection, which definitely has operations not supported or sensible in all states. You can't transmit on a closed connection, for example.
Premise
I believe that there is a way to objectively define "Good" and "Bad" Object-Oriented design techniques and that, as a community we can determine what these are. This is an academic exercise. If done with seriousness and resolve, I believe it can be of great benefit to the community as a whole. The community will benefit by having a place we can all point to to say, "This technique is 'Good' or 'Bad' and we should or should not use it unless there are special circumstances."
Plan
For this effort, we should focus on Object-Oriented principles (as opposed to Functional, Set-based, or other type of languages).
I'm not planning on accepting one answer, instead I'd like the answers to contribute to the final collection or be a rational debate of the issues.
I realize that this may controversial, but I believe we can iron something out. There are exceptions to most every rule and I believe this is where the disagreement will fall. We should make declarations and then note relevant exceptions and objections from dissenters.
Basis
I'd like to take a stab at defining "Good" and "Bad":
"Good" - This technique will work the first time and be a lasting solution. It will be easy to change later and will pay the time investment of its implementation quickly. It can be consistently applied and easily recognized by maintenance programmers in the future. Overall, it contributes to the good function and lowers cost of maintenance over the life of the product.
"Bad" - This technique may work in the short term, but soon becomes a liability. It is immediately difficult to change or becomes more difficult over time. The initial investment may be small or large, but it quickly becomes a growing cost, eventually becoming a sunk cost and must be removed or worked around constantly. It is subjectively applied and inconsistent and may be a surprise or not easily recognizable by maintenance programmers in the future. Overall, it contributes to the ultimate increasing cost of maintaining and/or operating the product and inhibits or prevents changes to the product. By inhibiting or preventing change, it becomes not just a direct cost, but an opportunity cost and a significant liability.
Starter
As an example of what I think a good contribution would look like, I'd like to propose a "Good" principle:
Separation of Concerns
[Short description]
Example
[Code or some other type of example]
Goals
[Explanation of what problems this principle prevents]
Applicability
[Why, where, and when would I use this principle?]
Exceptions
[When wouldn't I use this principle, or where might it actually be harmful?]
Objections
[Note any dissenting opinions or objections from the community here]
There are some well understood principles that might form a good starting point:
Open/Closed Principle
Liskov Substitution Principle
Law of Demeter
It is also a good idea to study existing design patterns to find principles behind them, the most important one is to (generally) prefer composition over inheritance.
Separation of Concerns
Prefer Aggregation to Mixin-style Inheritance
While functionality can be gained by inheriting from a utility class, in many cases it can all be gained using a member of said class.
Example (Boost.Noncopyable):
Boost.Noncopyable is a C++ class that lacks a copy constructor or assignment operator. It can be used as a base class to prevent the subclass from being copied or assigned (this is the common behavior). It can also be used as a direct member
Convert this:
class Foo : private boost::noncopyable { ... };
To this:
class Foo {
...
private:
boost::noncopyable noncopyable_;
};
Example (Lockable object):
Java introduced the synchronized keyword as an idiom to allow any object to be used in a threadsafe manner. This can be mirrored in other languages to provide mutexes to arbitrary objects. A common example is data structures:
class ThreadsafeVector<T> : public Vector<T>, public Mutex { ... };
Instead, the two classes could be aggregated together.
struct ThreadsafeVector<T> {
Vector<T> vector;
Mutex mutex;
}
Goals
Inheritance is frequently abused as a code-reuse mechanism. If inheritance is used for anything besides an Is-A relationship, overall code clarity is reduced.
With deeper chains, mixin base classes greatly increase the likelihood of a "Diamond of Death" scenario, wherein a subclass ends up inheriting multiple copies of a mixin class.
Applicability
Any language that supports multiple inheritance.
Exceptions
Any case where the mixin class provides or requires overloading members. In this case, inheritance usually implies an Is-Implemented-In-Terms-Of relationship, and an aggregate will not be sufficient.
Objections
The result of this transformation may lead to public members (e.g. MyThreadSafeDataStructure may have a publicly-accessible Mutex as a component).
I think the short answer is that "good" OO designs are robust under change, with the least code breakage for any requirements change. If you consider all the usual rules, they all tend to that same conclusion.
The difficulty is that you can't evaluate the "goodness" of the design without context; it is, I believe, a theorem that for any modularization, there exists a change in requirements that will maximize breakage, causing every class to be touched in each method.
If you want to be rigorous about it, you can develop a collection of "change cases" and order them in probability order, so that you minimize the breakage for the highest probability changes.
On most cases, though, some well-developed intuition helps a lot: device-specific or platform specific things tend to change, business rules and business process tend to change, while the implementations of, say, arithmetic, change very rarely. (Not, as you might imagine, never. Consider, for example, a business system that may or may not be able to make use of platform-supported BCD arithmetic.)