Are there any situations where a side effect on a "get" or "calculate" operation is legitimate? - oop

I've just finished a six hour debugging session for a weird UI effect where I found that my favorite framework's implementation of an interface function called "getVisibleRegion" disabled some UI feature (and apparently forgot to restore it).
I've filed a bug with the framework, but this made me think about proper design: under what conditions is it legitimate to have any side-effects on an operation with a name that implies a mere calculation/getting operation?
For those interested in the actual details: I had a report on a bug where my plug-in kept breaking Eclipse's code folding so that the folding bar disappeared and it was impossible to "unfold" or see folded code .
I traced it down to a call to getVisibleRegion() on an ITextViewer whose type represents a source code viewer. Now, ITextViewer's documentation does state that "Viewers implementing ITextViewerExtension5 may be forced to change the fractions of the input document that are shown, in order to fulfill this contract". The actual implementation, however, took this a little too liberally and just disabled projection (folding) permanently, never to bring it back.

The biggest reason I can think of is caching results.

I would say None.

This may be such an edge case that it doesn't even qualify as a side effect, but if the result of the calculation is cached in the object, then that would be acceptable. Even so, it shouldn't make a difference to the caller.

I would say only if it's very obvious that the side effect will occur. Here is a quick example:
MakeMyLifeEasyObject mmleo = new MakeMyLifeEasyObject(x, y, z, default, 12, something);
Object uniqueObjectOne = mmleo.getNewUniqueObject();
Object uniqueObjectTwo = mmleo.getNewUniqueObject();
System.out.println(uniqueObjectOne.getId() == uniqueObjectTwo.getId()); // Prints "false"
Now in my theory here, the MakeMyLifeEasyObject has some internal counter (like a primary key on a DB table). There is a side effect of the get. I can also come up with the idea of something like this:
Object thing = list.getNextObjectAndRemoveFromList();
That would make sense too.
Now the caveat of these is that in both cases, it's just better to rename the method.
The first one would probably be better as createNewUniqueObject(), while in the second a different name (in this case pop()) would be better.
When it's not some semi-contrived example like I gave above, I'd say the ONLY side effects that should be going on is creating/updating some cache if the value takes a long time to create or may be used quite a bit and needs to be sped up.
An example of this would be an object that holds a bunch of strings. You have a method, getThingToPrint() that needs to concatenate a bunch together. You could create a cache when that's called, and that would be a side effect. When you update one of the strings that things are based on, the set would invalidate the cache (or update it).
Something like what you described? Definatly sounds like a bug. I can't think of a situation where that would be a good idea. If that is an intended behavior and not a bug, then it should be renamed to something else (i.e. disableThingAndGetVisibleRegion()).

obj.getBusyDoingStuff()

Related

When should we use the advanced parameters of save()?

Normally we save an instance into the database simply with inst.save(), but Django uses user.save(using=self._db) in its source code. Also, it uses user.save(update_fields=['last_login']) elsewhere.
This somewhat confuses me. To make things worse, the document for the save() method is extremely simple:
Model.save(force_insert=False, force_update=False,
using=DEFAULT_DB_ALIAS, update_fields=None)[source]
If you want customized saving behavior, you can override this save()
method. See Overriding predefined model methods for more details.
The model save process also has some subtleties; see the sections
below.
It doesn't even contain the explanation of those parameters!
My question is: how do I know when I should use the advanced parameters of save()? If I'm implementing a custom model, I would definitely write user.save().
I've done a couple of experiments myself, like change user.save(using=self._db) to user.save(), and nothing went wrong, but I don't want to be surprised someday. Also, the parameters must be passed for some reasons, right?
The answer is you will know when you need to :)
For now resort to this practice
class MyModel(models.Model):
def save(self,*args, **kwargs):
# do whatever
super(MyModel,self).save(*args,**kwarags)
This way you make sure that you don't accidentally drop any of those mysterious, parameters. But let's try to demystify some of them.
using=self._db
This is to facilitate the use of multible databases in a single django app. Which most apps don't really need.
update_fields
If save() is passed a list of field names in keyword argument
update_fields, only the fields named in that list will be updated.
This may be desirable if you want to update just one or a few fields
on an object. There will be a slight performance benefit from
preventing all of the model fields from being updated in the database
https://docs.djangoproject.com/en/1.11/ref/models/instances/
So the link to the source code is a specific instance where they have used this feature. Quite useful to keep track of when a user logged in for the last time without updating the entire record.
force_insert vs force_update
These tell django to try forcing one or the other operation. Also explained to some extent in https://docs.djangoproject.com/en/1.11/ref/models/instances/
The example of user.save(using=self._db) I believe is redundant when you only have one db, usually defined as "default
. This example simply points out that if you have multiple dbs, you can pass in which of multiple dbs to use when saving.
update_fields is also handy when you keep a reference to an instance for a long time (for example in a middleware) and the data might be changed from somewhere else. In these cases you either have to perform a costly refresh_from_db() on the instance to get the newest data from the database, or when you only want to override specific attributes you can omit the refresh_from_db() call and just use save(update_fields=['attr1', 'attr2']). This will leave all other attributes unchanged in the database except for the ones you specified. If you omit update_fields in this case all values would be overwritten in the database with the values of your cached reference to the instance, leading to information loss.

Objective-C: if statement in custom setter

What's the purpose of if statement in a custom setter? I see this routine a lot in sample code. Provided using ARC, why bother checking the equality?
- (void)setPhotoDatabase:(UIManagedDocument *)photoDatabase
{
if (_photoDatabase != photoDatabase) {
_photoDatabase = photoDatabase;
...
}
}
The important part is typically what follows the change (what's in ...): side-effects after assigning new value, which can be very costly.
It's a good idea to restrict those changes to avoid triggering unnecessary and potentially very costly side effects. say you change a document, well you will likely need to change a good percentage of the the ui related to that document, as well as model changes.
When the conditions are checked, a significant amount of unnecessary/changes work may be short circuited, which could wind up avoiding making unnecessary changes.
such unnecessary side effects could easily eclipse your app's real work regarding CPU, drawing, object creation, writes to disk -- pretty much anything.
believe it or not, a lot of apps do perform significant amounts of unnecessary work, even if they are very well designed. drawing and ui updates in view-based rendering systems are probably the best example i can think of. in that domain, there are a ton of details one could implement to minimize redundant drawing.
One of the main reasons to override and implement custom setters is to execute additional code in response to changes of the property. If the property doesn't actually change, why execute that code?
The answer is usually in the ... section that you have commented out: when there is nothing there, the code makes no sense. However, a typical thing to have in that spot is some sort of notification of your own delegate, like this:
[myDelegate photoDatabaseDidChanged:photoDatabase];
This should not be called unless the photoDatabase has indeed changed. The call may be costly, anywhere from "expensive" to "very expensive", depending on what the delegate really does. It could be updating a screen with the images from the new library, or it could be saving new images into the cloud. If there is no need to report the change, you could be wasting the CPU cycles, along with the battery and the network bandwidth. Your code has no way of knowing what the delegate is going to do, so you need to avoid calling back unless the change did happen.
If you check for equality you can prevent the redundant assignment of the parameter that is passed into the method.
This way you can avoid the cost (even if it's small) of doing all the code within the brackets if there is no change to the photoDatabase in your sample method.
Ex (Extending your example):
- (void)setPhotoDatabase:(UIManagedDocument *)photoDatabase
{
if (_photoDatabase != photoDatabase)
{
_photoDatabase = photoDatabase;
// do stuff
// do more stuff
// do even more stuff
// do something really expensive
}
}
As you can see from the example, if you check first to see if the photoDatabase doesn't equal what is passed in, you can just exit the method and not run additional code that isn't necessary.

Is it OK to create an object inside a function

I work on a class in VBA, that encapsulates downloading stuff with MSXML2.XmlHttp.
There are three possibilities for the return value: Text, XML and Stream.
Should I create a function for each:
aText=myDownloader.TextSynchronous(URL,formData,dlPost,....)
aXml.load myDownloader.XmlSynchronous(URL,formData,dlPost,....)
Or can I just return the XmlHttpObject I created inside the class and then have
aText=myDownloader.Synchronous(URL,formData,dlPost,.....).ResponseText
aXML=myDownloader.Synchronous(URL,formData,dlPost,.....).ResponseXML
In the former case I can set the obj to nothing in the class but have to write several functions that are more or less the same.
In the latter case, I relay on the "garbage collector" but have a leaner class.
Both should work, but which one is better coding style?
In my opinion, the first way is better because you don't expose low level details to a high level of the abstraction.
I did something similar with a web crawler in Java, so I have a class only to manipulate the URL connection getting all the needed data (low level) and a high level class using the low level class that return an object called Page.
You can have a third method that only execute myDownloader.Synchronous(URL,formData,dlPost,.....) and stores the returned object in a private variable and the others method only manipulate this object. This form, you will only open the connection one time.
After much seeking around in the web (triggered by the comment by EmmadKareem) I found this:
First of all, Dont do localObject=Nothing at the end of a method - the variable goes out of scope anyway and is discarded. see this older but enlightening post on msdn
VBA uses reference counting and apart from some older bugs on ADO this seems to work woute well and (as I understand) immediately discards ressources that are not used anymore. So from a performance/memory usage point of view this seems not to be a problem.
As to the coding style: I think the uncomfortable fdeeling I had when I designed this could go away by simply renaming the function to myDownloader.getSyncDLObj(...) or some such.
There seem to be two camps on codestyle. One promotes clean code, which is easy to read, but uses five lines everytime you use it. Its most important prerogative is "every function should do one thing and one thing only. Their approach would probably look something like
myDownloader.URL="..."
myDownloader.method=dlSync
myDownloader.download
aText=myDownloader.getXmlHttpObj.ResponseText
myDownloader.freeResources
and one is OK with the more cluttered, but less lineconsuming
aText=myDownloader.getSyncObj(...).ResponseText
both have their merits both none is wrong, dangerous or frowned upon. As this is a helper class and I use it to remove the inner workings of the xmlhttp from the main code I am more comfortable with the second approach here. (One line for one goal ;)
I would be very interested on anyones take on that matter

Manipulating Objects in Methods instead of returning new Objects?

Let’s say I have a method that populates a list with some kind of objects. What are the advantages and disadvantages of following method designs?
void populate (ArrayList<String> list, other parameters ...)
ArrayList<String> populate(other parameters ...)
Which one I should prefer?
This looks like a general issue about method design but I couldn't find a satisfying answer on google, probably for not using the right keywords.
The second one seems more functional and thread safe to me. I'd prefer it in most cases. (Like every rule, there are exceptions.)
The owner of the populate method could return an immutable List (why ArrayList?).
It's also thread safe if there is no state modified in the populate method. Only passed in parameters are used, and these can also be immutable.
Other than what #duffymo mentioned, the second one is easier to understand, thus use: it is obvious what its input and output is.
Advantages to the in-out parameter:
You don't have to create as many objects. In languages like C or C++, where allocation and deallocation can be expensive, that can be a plus. In Java/C#, not so much -- GC makes allocation cheap and deallocation all but invisible, so creating objects isn't as big a deal. (You still shouldn't create them willy-nilly, but if you need one, the overhead isn't as bad as in some manual-allocation languages.)
You get to specify the type of the list. Potential plus if you need to pass that array to some other code you don't control later.
Disadvantages:
Readability issues.
In almost all languages that support function arguments, the first case is assumed to mean "do something with the entries in this list". Modifying args violates the Priciple of Least Astonishment. The second is assumed to mean "give me a list of stuff", which is what you're after.
Every time you say "ArrayList", or even "List", you take away a bit of flexibility. You add some overhead to your API. What if i don't want to create an ArrayList before calling your method? I shouldn't have to, if the method's whole purpose in life is to return me some entries. That's the API's job.
Encapsulation issues:
The method being passed a list to fill can't assume anything about that list (even that it's a list at all; it could be null).
The method passing the list can't guarantee anything about what the method does with it. If it's working correctly, sure, the API docs can say "this method won't destroy existing entries". But considering the chance of bugs, that may not be worth trusting. At least if the method returns its own list, the caller doesn't have to worry about what was in it before. And it doesn't have to worry about a bug from a thousand miles away corrupting data it should never have affected.
Thread safety issues.
The list could be locked by another thread, meaning if we try and lock on it now it could potentially lock up the app.
Or, if not locked, it could still be modified by another thread, in which case we're no less screwed. Unless you're going to write extra code to handle concurrent-modification exceptions everywhere.
Returning a new list means every call to the method can have its own list. No thread can mess with another thread's return value, unless the class is very badly designed.
Side point: Being able to specify the type of the list often leads to dependencies on the type of the list. Notice how you're passing ArrayLists around everywhere. You're painting yourself into corners by saying "This is an ArrayList" when you don't need to, but when you're passing it to a dozen methods, that's a dozen methods you'll have to change. (Not entirely related, but only slightly tangential. You could change the types to List rather than ArrayList and get rid of this. But the more you're passing that list around, the more places you'll need to change.)
Short version: Unless you have a damn good reason, use the first syntax only if you're using the existing contents of the list in your method. IE: if you're modifying it, or doing something with the existing values. If you intend to return a list of entries, then return a List of entries.
The second method is the preferred way for many reasons.
primarily because the function signature is more clear and shows what its intentions are.
It is actually recommended that you NEVER change the value of a parameter that is passed in to a function unless you explicitly mark it as an "out" parameter.
it will also be easier to use in expressions
and it will be easier to change in the future. including taking it to a more functional approach (for threading, etc.) if you would like to

Passing object references needlessly through a middleman

I often find myself needing reference to an object that is several objects away, or so it seems. The options I see are passing a reference through a middle-man or just making something available statically. I understand the danger of global scope, but passing a reference through an object that does nothing with it feels ridiculous. I'm okay with a little bit passing around, I suppose. I suspect there's a line to be drawn somewhere.
Does anyone have insight on where to draw this line?
Or a good way to deal with the problem of distributing references amongst dependent objects?
Use the Law of Demeter (with moderation and good taste, not dogmatically). If you're coding a.b.c.d.e, something IS wrong -- you've nailed forevermore the implementation of a to have a b which has a c which... EEP!-) One or at the most two dots is the maximum you should be using. But the alternative is NOT to plump things into globals (and ensure thread-unsafe, buggy, hard-to-maintain code!), it is to have each object "surface" those characteristics it is designed to maintain as part of its interface to clients going forward, instead of just letting poor clients go through such undending chains of nested refs!
This smells of an abstraction that may need some improvement. You seem to be violating the Law of Demeter.
In some cases a global isn't too bad.
Consider, you're probably programming against an operating system's API. That's full of globals, you can probably access a file or the registry, write to the console. Look up a window handle. You can do loads of stuff to access state that is global across the whole computer, or even across the internet... and you don't have to pass a single reference to your class to access it. All this stuff is global if you access the OS's API.
So, when you consider the number of global things that often exist, a global in your own program probably isn't as bad as many people try and make out and scream about.
However, if you want to have very nice OO code that is all unit testable, I suppose you should be writing wrapper classes around any access to globals whether they come from the OS, or are declared yourself to encapsulate them. This means you class that uses this global state can get references to the wrappers, and they could be replaced with fakes.
Hmm, anyway. I'm not quite sure what advice I'm trying to give here, other than say, structuring code is all a balance! And, how to do it for your particular problem depends on your preferences, preferences of people who will use the code, how you're feeling on the day on the academic to pragmatic scale, how big the code base is, how safety critical the system is and how far off the deadline for completion is.
I believe your question is revealing something about your classes. Maybe the responsibilities could be improved ? Maybe moving some code would solve problems ?
Tell, don't ask.
That's how it was explained to me. There is a natural tendency to call classes to obtain some data. Taken too far, asking too much, typically leads to heavy "getter sequences". But there is another way. I must admit it is not easy to find, but improves gradually in a specific code and in the coder's habits.
Class A wants to perform a calculation, and asks B's data. Sometimes, it is appropriate that A tells B to do the job, possibly passing some parameters. This could replace B's "getName()", used by A to check the validity of the name, by an "isValid()" method on B.
"Asking" has been replaced by "telling" (calling a method that executes the computation).
For me, this is the question I ask myself when I find too many getter calls. Gradually, the methods encounter their place in the correct object, and everything gets a bit simpler, I have less getters and less call to them. I have less code, and it provides more semantic, a better alignment with the functional requirement.
Move the data around
There are other cases where I move some data. For example, if a field moves two objects up, the length of the "getter chain" is reduced by two.
I believe nobody can find the correct model at first.
I first think about it (using hand-written diagrams is quick and a big help), then code it, then think again facing the real thing... Then I code the rest, and any smells I feel in the code, I think again...
Split and merge objects
If a method on A needs data from C, with B as a middle man, I can try if A and C would have some in common. Possibly, A or a part of A could become C (possible splitting of A, merging of A and C) ...
However, there are cases where I keep the getters of course.
But it's less likely a long chain will be created.
A long chain will probably get broken by one of the techniques above.
I have three patterns for this:
Pass the necessary reference to the object's constructor -- the reference can then be stored as a data member of the object, and doesn't need to be passed again; this implies that the object's factory has the necessary reference. For example, when I'm creating a DOM, I pass the element name to the DOM node when I construct the DOM node.
Let things remember their parent, and get references to properties via their parent; this implies that the parent or ancestor has the necessary property. For example, when I'm creating a DOM, there are various things which are stored as properties of the top-level DomDocument ancestor, and its child nodes can access those properties via the reference which each one has to its parent.
Put all the different things which are passed around as references into a single class, and then pass around just that one class instance as the only thing that's passed around. For example, there are many properties required to render a DOM (e.g. the GDI graphics handle, the viewport coordinates, callback events, etc.) ... I put all of these things into a single 'Context' instance which is passed as the only parameter to the methods of the DOM nodes to be rendered, and each method can get whichever properties it needs out of that context parameter.