Arguments documentation in caller and callee structure - documentation

Consider the following case:
a library contains a caller functions which calls multiple callee functions, and does nothing else.
def caller(a, b):
callee1(a)
callee2(b)
def calle1(a):
...
def calle2(b):
...
and the assumption is that the user might sometimes use the caller function
and sometimes use just one of the callee functions
I would like to document these functions.
As i see it the options to do this are the following:
Option 1
Document both the caller and the callees.
pros: The user will have a well-documented API whether he would like to use the caller function or the callee function
cons: The documentation of the parameters is duplicated, which can lead to mistakes and requires more maintenance.
Option 2
Document either the caller or the callees.
pros: No documentation duplication.
cons: The user will have not have a well-documented API in case he wishes to use the non-documented function
Both of these options, as i see them, have consequences i would like to avoid.
So my question is:
Are there any better ways to document this?
or are there better ways to structure this code to avoid these consequences?

Related

Anti-if purposes: How to check nulls?

I recently heard of the anti-if campaign and the efforts of some OOP gurus to write code without ifs, but using polymorphism instead. I just don't get how that should work, I mean, how it should ALWAYS work.
I already use polymorphism (didn't know about anti-if campaign), so, I was curious about "bad" and "dangerous" ifs and I went to see my code (Java/Swift/Objective-C) to see where I use if most, and it looks like these are the cases:
Check of null values. This is the most common situation where I ever use ifs. If a value could possibly be null, I have to manage it in a correct way. To use it, instead I have to check that it's not null. I don't see how polymorphism could compensate this without ifs.
Check for right values. I'll do an example here: Let's suppose that I have a login/signup application. I want to check that user did actually write a password, or that it's longer than 5 characters. How could it possibly be done without if/switches? Again, it's not about the type but about the value.
(optional) check errors. Optional because it's similar to point 2 about right values. If I get either a value or an error (passed as params) in a block/closure, how can I handle the error object if I just can't check if it's null or isn't?
If you know more about this campaign, please answer in scope of that. I'm asking this to understand their purposes and how effectively it could be done what they say.
So, I know not using ifs at all may not be the smartest idea ever, but just asking if and how it could effectively be done in an OOP program.
You'll never completely get rid of ifs, but you can minimize them.
Regarding null value checks, a method that would otherwise return a null value can return a Null Object instead, an object that doesn't represent a real value but implements some of the same behavior as a real value. Its callers can just call methods on the Null Object instead of checking to see if it's null. There is probably still an if inside the method, but there don't need to be any in the callers.
Regarding correct value checks, the best path is to prevent an object from being instantiated with incorrect attributes. All users of the object can then be confident that they don't have to inspect the object's attributes to use it. Similarly, if an object can have an attribute that is valid or invalid, it can hide that from its users by providing higher-level methods that do the right thing for the current attribute value. Again, there is still a if inside the object, but there don't need to be any in the callers.
Regarding error checks, there are a couple of strategies that are better than returning a possibly null error value that the caller might forget to check. One is raising an exception. Another is to return an object of a type that can hold either a result or an error and provides type-safe, if-free ways to operate on either result when appropriate, like Java's Optional or Haskell's Maybe.
Note also that case statements are just concatenated ifs (in fact I'd have written the code on the campaign's home page with a switch rather than if/else if), and there are also patterns which replace case with polymorphism, such as the Strategy pattern.
This is a great question and is something that's asked at every OO bootcamp I've been a part of. To begin with, we need to understand why code with a lot of ifs is 'bad' or 'dangerous':
they increase the cyclomatic complexity of the code, making it hard to follow/understand.
they make tests more complicated to write. Ensuring that you test each branch flow in the method under test becomes increasingly more difficult with each conditional and makes test setup cumbersome.
they could be a sign that your code has not been broken into small enough methods
they could be a sign that your methods have not been encapsulated well
However, there is one important thing to remember - ifs cannot(and should not) be eliminated from the code completely. But, we can generally abstract them away using techniques like polymorphism, extracting small behaviours, and encapsulating these behaviours into the appropriate classes.
Now that we know some of the reasons why we should avoid ifs, let's tackle your questions:
Checking for null values: The Null object pattern helps you eliminate null checks from your code(polymorphism FTW). Instead of returning null, you return a Special Case NullObject representation of the expected object. This NullObject has the same interfaces as your actual object and you can safely call any of the object's methods without worrying about a null pointer exception being thrown.
Checking for correctness of values: There are a lot of ways to do this. For example, you could create a separate ValidationRule class for each of your validations and then chain calls to them together when you want to validate your object. Notice that the ifs still remain, but they get abstracted away into the individual ValidationRule implementations. Look up the Command pattern and the Chain Of Responsibility pattern for ideas.
It's better to use if to check the null instead of raising an exception. Also in common cases checking for null helps us to prevent operations with non-initialized variables.
Using switch plus SOLID. Other thinks inherited from this.

Data provider calling a delegate: specifics or generic?

I have a XML parser which will parse 17 different XML documents (I'm simplifying this).
When the parser has finished its job, it calls the object that did the request.
First way
A single method that looks like
- (void)didReceiveObject:(NSObject *)object ofType:(MyObjectType)type
with MyObjectType being an enum.
In this method, I check the type and redirect the object to the corresponding method.
Second way
There is a callback method for each of the 17 types of object I can receive.
- (void)didReceiveFoo:(MYFoo *)foo
- (void)didReceiveBar:(MYBar *)bar
... and so on
Which way of using delegates will be better?
We had a discussion about this with a colleague and couldn't find one way more appealing than another. It seems like it's just deciding what method to call from the parser or within the delegate....
Even when thinking about adding future methods/delegates callbacks, we don't see any real problem.
Is one of these ways better than the other? Is there another way?
Why not go with
- (void)didReceiveObject:(NSObject *)object
and then inspect the class type?
This seems cleaner and more extensible to me, because it means you can parse other objects in the future without adding more callbacks.
(I know this is the same as option one, but I wanted to point out that your second argument was unnecessary.)
First method:
Pros:
More flexible to future changes.
Cons:
May result in a large switch statement or messy if ... else if ... else statement.
Probably results in a series of explicit methods anyway.
Requires type cast.
Second method:
Pros:
No type casting.
If methods are optional, delegate is only bothered with the objects it's interested in.
Cons:
If methods are not optional and the interface is expanded later, all delegates will have warnings until the new methods are implemented.
If methods are not optional, this can be a lot of methods to implement for every delegate.
Generally when building delegate interfaces I lean towards generics for future extensibility. Changing an API, especially with open source code, can be very difficult. Also, I don't quite understand why you have one XML parser doing so much. You may want to consider a different design. 17 different XML documents seems like a lot. That aside, I'll propose a third method.
Third method:
Create a dictionary that maps strings to blocks. The blocks would probably be of type void(^BlockName)(id obj). Your parser would define a series of strings that will be the keys for your various blocks. For example,
NSString * const kFooKey = #"FooKey";
NSString * const kBarKey = #"BarKey";
// And so on...
Whoever creates the XML parser would register a block for each key they are interested in. They only need to register for the keys they are interested in and it's completely flexible to future change. Since you are registering for explicit keys/objects, you can assert the passed in type without a type cast (essentially Design By Contract). This might be over kill for what you want, but I've found similar designs very beneficial in my code. It combines the pros of both of your solutions. It's main downfall is if you want to use an SDK that doesn't have blocks. However, blocks are becoming a de facto standard with Objective-C.
On top of this you may want to define a protocol that encompasses the common functionality of your 17 objects, if you haven't done so already. This would change your block type to void(^BlockName)(id<YourProtocol> obj).
Here's the decision.
We will implement both and see which way is the more used.
The first way is the easiest and fastest so we will keep it for internal needs.
But we may be shipping this code as a static library so we want to give the minimal amount of information. So we will also stick with the with the second way.
As there should be a big chunk of code for each callback, the generic way will certainly be the big switch statement rbrown pointed.
Thank you for your help.

Is it good convention for a class to perform functions on itself?

I've always been taught that if you are doing something to an object, that should be an external thing, so one would Save(Class) rather than having the object save itself: Class.Save().
I've noticed that in the .Net libraries, it is common to have a class modify itself as with String.Format() or sort itself as with List.Sort().
My question is, in strict OOP is it appropriate to have a class which performs functions on itself when called to do so, or should such functions be external and called on an object of the class' type?
Great question. I have just recently reflected on a very similar issue and was eventually going to ask much the same thing here on SO.
In OOP textbooks, you sometimes see examples such as Dog.Bark(), or Person.SayHello(). I have come to the conclusion that those are bad examples. When you call those methods, you make a dog bark, or a person say hello. However, in the real world, you couldn't do this; a dog decides himself when it's going to bark. A person decides itself when it will say hello to someone. Therefore, these methods would more appropriately be modelled as events (where supported by the programming language).
You would e.g. have a function Attack(Dog), PlayWith(Dog), or Greet(Person) which would trigger the appropriate events.
Attack(dog) // triggers the Dog.Bark event
Greet(johnDoe) // triggers the Person.SaysHello event
As soon as you have more than one parameter, it won't be so easy deciding how to best write the code. Let's say I want to store a new item, say an integer, into a collection. There's many ways to formulate this; for example:
StoreInto(1, collection) // the "classic" procedural approach
1.StoreInto(collection) // possible in .NET with extension methods
Store(1).Into(collection) // possible by using state-keeping temporary objects
According to the thinking laid out above, the last variant would be the preferred one, because it doesn't force an object (the 1) to do something to itself. However, if you follow that programming style, it will soon become clear that this fluent interface-like code is quite verbose, and while it's easy to read, it can be tiring to write or even hard to remember the exact syntax.
P.S.: Concerning global functions: In the case of .NET (which you mentioned in your question), you don't have much choice, since the .NET languages do not provide for global functions. I think these would be technically possible with the CLI, but the languages disallow that feature. F# has global functions, but they can only be used from C# or VB.NET when they are packed into a module. I believe Java also doesn't have global functions.
I have come across scenarios where this lack is a pity (e.g. with fluent interface implementations). But generally, we're probably better off without global functions, as some developers might always fall back into old habits, and leave a procedural codebase for an OOP developer to maintain. Yikes.
Btw., in VB.NET, however, you can mimick global functions by using modules. Example:
Globals.vb:
Module Globals
Public Sub Save(ByVal obj As SomeClass)
...
End Sub
End Module
Demo.vb:
Imports Globals
...
Dim obj As SomeClass = ...
Save(obj)
I guess the answer is "It Depends"... for Persistence of an object I would side with having that behavior defined within a separate repository object. So with your Save() example I might have this:
repository.Save(class)
However with an Airplane object you may want the class to know how to fly with a method like so:
airplane.Fly()
This is one of the examples I've seen from Fowler about an aenemic data model. I don't think in this case you would want to have a separate service like this:
new airplaneService().Fly(airplane)
With static methods and extension methods it makes a ton of sense like in your List.Sort() example. So it depends on your usage pattens. You wouldn't want to have to new up an instance of a ListSorter class just to be able to sort a list like this:
new listSorter().Sort(list)
In strict OOP (Smalltalk or Ruby), all methods belong to an instance object or a class object. In "real" OOP (like C++ or C#), you will have static methods that essentially stand completely on their own.
Going back to strict OOP, I'm more familiar with Ruby, and Ruby has several "pairs" of methods that either return a modified copy or return the object in place -- a method ending with a ! indicates that the message modifies its receiver. For instance:
>> s = 'hello'
=> "hello"
>> s.reverse
=> "olleh"
>> s
=> "hello"
>> s.reverse!
=> "olleh"
>> s
=> "olleh"
The key is to find some middle ground between pure OOP and pure procedural that works for what you need to do. A Class should do only one thing (and do it well). Most of the time, that won't include saving itself to disk, but that doesn't mean Class shouldn't know how to serialize itself to a stream, for instance.
I'm not sure what distinction you seem to be drawing when you say "doing something to an object". In many if not most cases, the class itself is the best place to define its operations, as under "strict OOP" it is the only code that has access to internal state on which those operations depend (information hiding, encapsulation, ...).
That said, if you have an operation which applies to several otherwise unrelated types, then it might make sense for each type to expose an interface which lets the operation do most of the work in a more or less standard way. To tie it in to your example, several classes might implement an interface ISaveable which exposes a Save method on each. Individual Save methods take advantage of their access to internal class state, but given a collection of ISaveable instances, some external code could define an operation for saving them to a custom store of some kind without having to know the messy details.
It depends on what information is needed to do the work. If the work is unrelated to the class (mostly equivalently, can be made to work on virtually any class with a common interface), for example, std::sort, then make it a free function. If it must know the internals, make it a member function.
Edit: Another important consideration is performance. In-place sorting, for example, can be miles faster than returning a new, sorted, copy. This is why quicksort is faster than merge sort in the vast majority of cases, even though merge sort is theoretically faster, which is because quicksort can be performed in-place, whereas I've never heard of an in-place merge-sort. Just because it's technically possible to perform an operation within the class's public interface, doesn't mean that you actually should.

Law of Demeter and Class Constructors

The Law of Demeter does not prevent passing objects into class constructors. However, it does forbid getting that same object back later and calling a method on it to get a scalar value out. Instead, a proxy method is supposed to be created that returns the scalar value instead. My question is, why is it acceptable to pass an object into a class constructor but unacceptable to get the same object back later and pull a value from it?
Because the Law of Demeter says that you should not design the external interface of an object to make it look as if it is composed of certain other objects with known interfaces, that clients can just grab hold of and access.
You pass an object into the constructor to tell your new object how to behave, but it is none of your business whether the object keeps that parameter object around, or keeps a copy of it, or just looks at it once and forgets it ever existed. By having a getMyParameterBack method, you've committed all future implementations to be able to produce that whole object on demand, and all clients to couple with two interfaces instead of one.
For example, if you pass in a URL parameter to your HTTPRequest object's constructor, then that doesn't mean HTTPRequest should have a getURL method which returns a URL object on which the caller is then expected to call getProtocol, getQueryString, etc. If someone who has an HTTPRequest object might want to know the protocol of the request, they should (the Law says) find out by calling getProtocol on the object they have, not on some other object that they happen to know HTTPRequest is storing internally.
The idea is to reduce coupling - without the Law of Demeter, the user has to know the interface to HTTPRequest and URL in order to get the protocol. With the Law, they only need the interface to HTTPRequest. And HTTPRequest.getProtocol() clearly can return "http" without needing some URL object to be involved in the discussion.
The fact that sometimes the user of the request object happens to be the one who created it, and therefore is using the URL interface too in order to pass the parameter, is neither here nor there. Not all users of HTTPRequest objects will have created them themselves. So clients which are entitled under the Law to access the URL because they created it themselves, can do it that way rather than grabbing it back off the Request. Clients which did not create the URL can't.
Personally I think the Law of Demeter as usually stated in simple form, is cracked. Are they seriously saying that if my object has a string Name field, and I want to know whether the Name contains any non-ASCII characters, then I must either define a NameContainsNonASCIICharacters method on my object instead of looking at the string itself, or else add a visitName function to the class taking a callback function in order to work around the restriction by ensuring that the string is a parameter to a function I've written? That doesn't change the coupling at all, it just replaces getter methods with visitor methods. Should every class which returns an integer have a full set of arithmetic operations, in case I want to manipulate the return value? getPriceMultipliedBy(int n)? Surely not.
What it is useful for, is that when you break it you can ask yourself why you're breaking it, and whether you could design a better interface by not breaking it. Frequently you can, but really it depends what kinds of objects you're talking about. Certain interfaces can safely be coupled against vast swathes of code - things like integer, string, and even URL, which represent widely-used concepts.
JP's answer is pretty good, so this is just a supplement, not a disagreement or other replacement.
The way I understand this heuristic is that a call to A shouldn't break because of class B changing. So if you chain your calls with a.b.foo(), then A's interface becomes dependent upon B's, violating the rule. Instead, you're supposed to call a.BFoo(), which calls b.foo() for you.
This is a good rule of thumb, but it can lead to awkward code that doesn't really address the dependency so much as enshrine it. Now A has to offer BFoo forever, even when B no longer offers Foo. Not much of an improvement and it would be arguably better in at least some cases if changes to B broke the caller that wants Foo, not B itself.
I would also add that, strictly speaking, this rule is broken constantly for a certain group of ubiquitous classe, such as string. Perhaps it's acceptable to decide which classes are likewise ubiquitous within a particular layer of an application and freely ignore Demeter's "Rule" for them.
The idea is that you only talk to your immediate friends. So, you don't do this ...
var a = new A();
var x = a.B.doSomething();
Instead you do this ...
var a = new A();
var x = a.doSomething(); // where a.doSomething might call b.doSomething();
It has it's advantages, as things become simpler for callers (Car.Start() versus Car.Engine.Start()), but you get lots of little wrapping methods. You can also use the Mediator pattern to mitigate this type of "violation".

Explicit API methods vs. generalised parameter-based API methods

When defining a customer-accessible API, what is the preferred industry practice between the following:
a) Defining a set of explicit API methods, each with a very narrow and specific purpose, for example:
SetUserName <name>
SetUserAge <age>
SetUserAddress <address>
b) Defining a set of more generalised parameter-based API methods, for example:
SetUserAttribute <attribute>
enum attribute {
name,
age,
address
}
My opinion:
In favour of (a)
For boolean-based methods (e.g. EnableFoo) I would definely favour option (a) as the intentions are much more clear, it's less likely to require extensions in the future, and it makes more readable code.
For example, a method called EnableDisableFoo which takes a boolean parameter indicating whether to enable or disable would not be very clear, nor have a cohesive purpose.
It's where there are multiple options that the problem gets more complicated.
In favour of (b)
Option (b) is a great way of providing extensibility in the API, but at the expense of usability. With option (a), the API method name itself gives enough information to indicate what it is doing. With option (b), the user has to look up both the method name and the appropriate enumeration/parameter to use. In theory this makes option (b) worse from a usability standpoint -- but maybe having less methods is a good thing, so even this isn't completely true.
Other thoughts
It's necessary to strike a good balance between usability and extensibility, and they are often at odds with each other. But I'd like to think there is a more objective way to analyse this, rather than relying on the opinion of the API designer.
Does anyone have any thoughts on this?
I would personally argue for (a), since our goal is to make the "static" code as accurate and reliable as possible.
By using the generalized form, we are introducing a risk for runtime errors. For example, I could set an attribute of type age with a value that is actually a string, etc.
This is very similar to the argument for defining and using enums or explicit types rather than using and returning ints in the old C style, as you get one more level of assurance.
While I agree that (b) allows extensibility, I have not seen too many APIs that would require this sort of extensibility for completely different types of attributes. The only common use of (b) is in polymorphic code, where the function could technically accept anything, including extensions.
Another consideration is whether you want to set all attributes, and to set them simultaneously. For example, when you want to send something to a printer there may be dozens of parameters to be set (landscape or portrait; number of copies; page size; resolution; etc.). Instead of defining an API which needs to be invoked dozens of times, you can define a single function, which takes a struct as a parameter, where the struct contains dozens of fields, and where the caller initializes the struct at its leisure, and and then passes the struct in to the API in a single function call.
I think it depends on the code that you're writing. If you're writing about stuff that always goes together (i.e., if you're going to use/change age always with name), then go for b, otherwise a is fine.
But don't try to over-do (a) because then you're just going to write a lot more lines and get a lot less done. Good idea if you're paid for the amount of code you write though :)