Aren't class member varialbes as bad as global variable? - oop

We all know the following code is bad because of the use of global variable:
a = 3.14
def inc():
global a
a += 1
inc()
b = a
It is so bad that in python one has to intentionally declare global to "break" the ban. But in the name of OO, how come the code below is justified:
class A:
def __init__(self):
self.a = 3.14
def inc(self):
self.a += 1
A_instance = A()
A_instance.inc()
b = A_instance.a
Within the scope of the class instance, isn't the member variable simply the same as global variable? "self" basically grant unrestricted access to all variable to any member function regardless the necessity, making it very stateful, prone to mistakes, and difficult to read (basically all the bad traits of global var)?
We also know the clean and better way is:
def inc(_a):
return _a+1
a = 3.14
b = inc(a)
So that the function inc() is at least stateless and has a predictable behavior. What's the best practice to use class and make it as stateless as possible? Or by definition, is OO always stateful and one has to go functional programming without classes at all? I also heard people say #static methods are stupid, break encapsulation, and should be avoided...

It comes down to whether you need to save state or not. If you do need to save state then it needs to be saved somewhere. Classes allow you to save state alongside the methods that operate on that state and enforce a class-local scope that is less error prone than storing state within a global scope.
As a concrete example, I recently implemented a Tree class for a forest simulation. One of the variables that is important to associate with a tree is its age. Where should the age of the tree be stored? It makes more sense to store it within the Tree class than outside of it.
If protecting the age of the tree from being manipulated by code outside of the class was important, I could have used Python's excellent #property syntax for that purpose, or less directly through the Python _varname variable naming convention.
Within the class, yes, the age of the tree is visible to the other methods, but that is a relatively narrow scope, certainly not akin to a global scope if your classes are written with a single responsibility in accord with SOLID object oriented design principles.

Now the reason why the concept of self is not considered bad practice is that if you are to create a class everything within that class ought to be related in one way or another which usually is class attributes that class instances are initialized with via the __init__ dunder method. Basically, if you are going to create a class then there ought to be a link of some sort, a class global if you will which are class attributes
usually the idea that creating classes with just self like this
class A:
def __init__(self):
self.a = 3.14
def inc(self):
self.a += 1
is not something that is often seen, rather people chose the way which is generally in many opinions considered cleaner and more predictable
class A:
def __init__(self, a):
self.a = a
def inc(self):
self.a += 1
With this when you declare class instances you have more control of what is actually there, so class instances are hence declared with variables
A_instance = A(3.14) #a will be 3.14 in this instance
B_instance = A(3.142) #a will be 3.142 in this instance
and so on.
Class variables that are declared outside the __init__ function but inside the class, itself are considerably different and that would look like
class A:
a = 22/7
def __init__(self):
....
with this a would be a global variable throughout the class(not the instance but the class itself). So what does that look like
class A:
a = 22/7
def __init__(self, item):
self.item = item
A_instance = A(3.14)
B_instance = A(5)
Although A_instance and B_instance are completely different and A_instance.item will equal 3.14 while B_instance.item will equal 5 both of these will have the class variable a as 22/7
A_instance.a #this will return a value of 22/7
B_instance.a #this will also return a value of 22/7
So the class variable remains consistent/the same throughout all instances of that class unless it is actively altered.
Point is if you are going to create a class it should have specific variables that are intertwined with each other and affect all a particular group of functions, these are the functions you make into class methods as such the use of self within classes is not wrong if used properly

Related

When is a class a data class?

I know what classes are about, but for better understanding I need a use case. Recently I discovered the construct of data classes. I get the idea behind normal classes, but I cannot imagine a real use case for data classes.
When should I use a data class and when I use a "normal" class? For all I know, all classes keep data.
Can you provide a good example that distinguishes data classes from non-data classes?
A data class is used to store data. It's lighter than a normal class, and can be compared to an array with key/value (dictionary, hash, etc.), but represented as an object with fixed attributes. In kotlin, according to the documentation, that adds those attributes to the class:
equals()/hashCode() pair
toString() of the form "User(name=John, age=42)"
componentN() functions corresponding to the properties in their order of declaration.
copy() function
Also it has a different behavior during class inheritence :
If there are explicit implementations of equals(), hashCode(), or toString() in the data class body or final implementations in a
superclass, then these functions are not generated, and the existing
implementations are used.
If a supertype has componentN() functions that are open and return compatible types, the corresponding functions are generated for the
data class and override those of the supertype. If the functions of
the supertype cannot be overridden due to incompatible signatures or
due to their being final, an error is reported.
Providing explicit implementations for the componentN() and copy() functions is not allowed.
So in kotlin, if you want to describe an object (a data) then you may use a dataclass, but if you're creating a complex application and your class needs to have special behavior in the constructor, with inheritence or abstraction, then you should use a normal class.
I do not know Kotlin, but in Python, a dataclass can be seen as a structured dict. When you want to use a dict to store an object which has always the same attributes, then you should not put it in a dict but use a Dataclass.
The advantage with a normal class is that you don't need to declare the __init__ method, as it is "automatic" (inherited).
Example :
This is a normal class
class Apple:
def __init__(size:int, color:str, sweet:bool=True):
self.size = size
self.color = color
self.sweet = sweet
Same class as a dataclass
from dataclasses import dataclass
#dataclass
class Apple:
size: int
color: str
sweet: bool = True
Then the advantage compared to a dict is that you are sure of what attribute it has. Also it can contains methods.
The advantage over to a normal class is that it is simpler to declare and make the code lighter. We can see that the attributes keywords (e.g size) are repeated 3 times in a normal class, but appear only once in a dataclass.
The advantage of normal class also is that you can personalize the __init__ method, (in a dataclass also, but then you lose it's main advantage I think) example:
# You need only 2 variable to initialize your class
class Apple:
def __init__(size:int, color:str):
self.size = size
self.color = color
# But you get much more info from those two.
self.sweet = True if color == 'red' else False
self.weight = self.__compute_weight()
self.price = self.weight * PRICE_PER_GRAM
def __compute_weight(self):
# ...
return (self.size**2)*10 # That's a random example
Abstractly, a data class is a pure, inert information record that doesn’t require any special handling when copied or passed around, and it represents nothing more than what is contained in its fields; it has no identity of its own. A typical example is a point in 3D space:
data class Point3D(
val x: Double,
val y: Double,
val z: Double
)
As long as the values are valid, an instance of a data class is entirely interchangeable with its fields, and it can be put apart or rematerialized at will. Often there is even little use for encapsulation: users of the data class can just access the instance’s fields directly. The Kotlin language provides a number of convenience features when data classes are declared as such in your code, which are described in the documentation. Those are useful when for example building more complex data structures employing data classes: you can for example have a hashmap assign values to particular points in space, and then be able to look up the value using a newly-constructed Point3D.
val map = HashMap<Point3D, String>()
map.set(Point3D(3, 4, 5), "point of interest")
println(map.get(Point3D(3, 4, 5))) // prints "point of interest"
For an example of a class that is not a data class, take FileReader. Underneath, this class probably keeps some kind of file handle in a private field, which you can assume to be an integer (as it actually is on at least some platforms). But you cannot expect to store this integer in a database, have another process read that same integer from the database, reconstruct a FileReader from it and expect it to work. Passing file handles between processes requires more ceremony than that, if it is even possible on a given platform. That property makes FileReader not a data class. Many examples of non-data classes will be of this kind: any class whose instances represent transient, local resources like a network connection, a position within a file or a running process, cannot be a data class. Likewise, any class where different instances should not be considered equal even if they contain the same information is not a data class either.
From the comments, it sounds like your question is really about why non-data classes exist in Kotlin and why you would ever choose not to make a data class. Here are some reasons.
Data classes are a lot more restrictive than a regular class:
They have to have a primary constructor, and every parameter of the primary constructor has to be a property.
They cannot have an empty primary constructor.
They cannot be open so they cannot be subclassed.
Here are other reasons:
Sometimes you don't want a class to have a copy function. If a class holds onto some heavy state that is expensive to copy, maybe it shouldn't advertise that it should be copied by presenting a copy function.
Sometimes you want to use an instance of a class in a Set or as Map keys without two different instances being considered as equivalent just because their properties have the same values.
The features of data classes are useful specifically for simple data holders, so the drawbacks are often something you want to avoid.

Mixing Private and Public Attributes and Accessors in Raku

#Private attribute example
class C {
has $!w; #private attribute
multi method w { $!w } #getter method
multi method w ( $_ ) { #setter method
warn “Don’t go changing my w!”; #some side action
$!w = $_
}
}
my $c = C.new
$c.w( 42 )
say $c.w #prints 42
$c.w: 43
say $c.w #prints 43
#but not
$c.w = 44
Cannot modify an immutable Int (43)
so far, so reasonable, and then
#Public attribute example
class C {
has $.v is rw #public attribute with automatic accessors
}
my $c = C.new
$c.v = 42
say $c.v #prints 42
#but not
$c.v( 43 ) #or $c.v: 43
Too many positionals passed; expected 1 argument but got 2
I like the immediacy of the ‘=‘ assignment, but I need the ease of bunging in side actions that multi methods provide. I understand that these are two different worlds, and that they do not mix.
BUT - I do not understand why I can’t just go
$c.v( 43 )
To set a public attribute
I feel that raku is guiding me to not mix these two modes - some attributes private and some public and that the pressure is towards the method method (with some : sugar from the colon) - is this the intent of Raku's design?
Am I missing something?
is this the intent of Raku's design?
It's fair to say that Raku isn't entirely unopinionated in this area. Your question touches on two themes in Raku's design, which are both worth a little discussion.
Raku has first-class l-values
Raku makes plentiful use of l-values being a first-class thing. When we write:
has $.x is rw;
The method that is generated is:
method x() is rw { $!x }
The is rw here indicates that the method is returning an l-value - that is, something that can be assigned to. Thus when we write:
$obj.x = 42;
This is not syntactic sugar: it really is a method call, and then the assignment operator being applied to the result of it. This works out, because the method call returns the Scalar container of the attribute, which can then be assigned into. One can use binding to split this into two steps, to see it's not a trivial syntactic transform. For example, this:
my $target := $obj.x;
$target = 42;
Would be assigning to the object attribute. This same mechanism is behind numerous other features, including list assignment. For example, this:
($x, $y) = "foo", "bar";
Works by constructing a List containing the containers $x and $y, and then the assignment operator in this case iterates each side pairwise to do the assignment. This means we can use rw object accessors there:
($obj.x, $obj.y) = "foo", "bar";
And it all just naturally works. This is also the mechanism behind assigning to slices of arrays and hashes.
One can also use Proxy in order to create an l-value container where the behavior of reading and writing it are under your control. Thus, you could put the side-actions into STORE. However...
Raku encourages semantic methods over "setters"
When we describe OO, terms like "encapsulation" and "data hiding" often come up. The key idea here is that the state model inside the object - that is, the way it chooses to represent the data it needs in order to implement its behaviors (the methods) - is free to evolve, for example to handle new requirements. The more complex the object, the more liberating this becomes.
However, getters and setters are methods that have an implicit connection with the state. While we might claim we're achieving data hiding because we're calling a method, not accessing state directly, my experience is that we quickly end up at a place where outside code is making sequences of setter calls to achieve an operation - which is a form of the feature envy anti-pattern. And if we're doing that, it's pretty certain we'll end up with logic outside of the object that does a mix of getter and setter operations to achieve an operation. Really, these operations should have been exposed as methods with a names that describes what is being achieved. This becomes even more important if we're in a concurrent setting; a well-designed object is often fairly easy to protect at the method boundary.
That said, many uses of class are really record/product types: they exist to simply group together a bunch of data items. It's no accident that the . sigil doesn't just generate an accessor, but also:
Opts the attribute into being set by the default object initialization logic (that is, a class Point { has $.x; has $.y; } can be instantiated as Point.new(x => 1, y => 2)), and also renders that in the .raku dumping method.
Opts the attribute into the default .Capture object, meaning we can use it in destructuring (e.g. sub translated(Point (:$x, :$y)) { ... }).
Which are the things you'd want if you were writing in a more procedural or functional style and using class as a means to define a record type.
The Raku design is not optimized for doing clever things in setters, because that is considered a poor thing to optimize for. It's beyond what's needed for a record type; in some languages we could argue we want to do validation of what's being assigned, but in Raku we can turn to subset types for that. At the same time, if we're really doing an OO design, then we want an API of meaningful behaviors that hides the state model, rather than to be thinking in terms of getters/setters, which tend to lead to a failure to colocate data and behavior, which is much of the point of doing OO anyway.
BUT - I do not understand why I can’t just go $c.v( 43 ) To set a public attribute
Well, that's really up to the architect. But seriously, no, that's simply not the standard way Raku works.
Now, it would be entirely possible to create an Attribute trait in module space, something like is settable, that would create an alternate accessor method that would accept a single value to set the value. The problem with doing this in core is, is that I think there are basically 2 camps in the world about the return value of such a mutator: would it return the new value, or the old value?
Please contact me if you're interested in implementing such a trait in module space.
I currently suspect you just got confused.1 Before I touch on that, let's start over with what you're not confused about:
I like the immediacy of the = assignment, but I need the ease of bunging in side actions that multi methods provide. ... I do not understand why I can’t just go $c.v( 43 ) To set a public attribute
You can do all of these things. That is to say you use = assignment, and multi methods, and "just go $c.v( 43 )", all at the same time if you want to:
class C {
has $!v;
multi method v is rw { $!v }
multi method v ( :$trace! ) is rw { say 'trace'; $!v }
multi method v ( $new-value ) { say 'new-value'; $!v = $new-value }
}
my $c = C.new;
$c.v = 41;
say $c.v; # 41
$c.v(:trace) = 42; # trace
say $c.v; # 42
$c.v(43); # new-value
say $c.v; # 43
A possible source of confusion1
Behind the scenes, has $.foo is rw generates an attribute and a single method along the lines of:
has $!foo;
method foo () is rw { $!foo }
The above isn't quite right though. Given the behavior we're seeing, the compiler's autogenerated foo method is somehow being declared in such a way that any new method of the same name silently shadows it.2
So if you want one or more custom methods with the same name as an attribute you must manually replicate the automatically generated method if you wish to retain the behavior it would normally be responsible for.
Footnotes
1 See jnthn's answer for a clear, thorough, authoritative accounting of Raku's opinion about private vs public getters/setters and what it does behind the scenes when you declare public getters/setters (i.e. write has $.foo).
2 If an autogenerated accessor method for an attribute was declared only, then Raku would, I presume, throw an exception if a method with the same name was declared. If it were declared multi, then it should not be shadowed if the new method was also declared multi, and should throw an exception if not. So the autogenerated accessor is being declared with neither only nor multi but instead in some way that allows silent shadowing.

OOP: class inheritance to add just one property vs constructor argument

I'm new to OOP and I'm in the following situation: I have something like a report "Engine" that is used for several reports, the only thing needed is the path of a config file.
I'll code in Python, but this is an agnostic question.So, I have the following two approaches
A) class ReportEngine is an abstract class that has everything needed BUT the path for the config file. This way you just have to instantiate the ReportX class
class ReportEngine(object):
...
class Report1(ReportEngine):
_config_path = '...'
class Report2(ReportEngine):
_config_path = '...'
report_1 = Report1()
B) class ReportEngine can be instantiated passing the config file path
class ReportEngine(object):
def __init__(self, config_path):
self._config_path = config_path
...
report_1 = ReportEngine(config_path="/files/...")
Which approach is the right one? In case it matters, the report object would be inserted in another class, using composition.
IMHO the A) approach is better if you need to implement report engines that are different from each other. If your reports are populated using different logic, follow this approach.
But if the only difference among your report engines is the _config_path i think that B) approach is the right one for you. Obviosly, this way you'll have a shared logic to build every report, regardless the report type.
Generally spoken, put everything which every Report has, in the superclass. Put specific things in the subclasses.
So in your case, put the _config_path in the superclass ReportEngine like in B) (since every Report has a _config_path), but instanciate specific Reports like in A), whereas every Report can set its own path.
I don't know Python, but did a quick search for the proper syntax for Python 3.0+, I hope it makes sense:
class ReportEngine(object):
def __init__(self, config_path):
self._config_path = config_path
def printPath(self):
print self._config_path
...
class Report1(ReportEngine):
def __init__(self):
super().__init__('/files/report1/...')
Then a
reportObj = Report1()
reportObj.printPath()
should print
'/files/report1/...'
Basically the main difference is that approach A is more flexible than B(not mutual change in one report does not influence other reports), while B is simpler and clearer (shows exactly where the difference is) but a change affecting one report type would require more work. If you are pretty sure the reports won't change in time - go with B, if you feel like the differences will not be common in the future - go with A.

How to call appropriate subclass constructor inside base class constructor in MATLAB

I'm trying to use single inheritance in Matlab, and to write a base class constructor that allows the creation of arrays of objects, including empty arrays, and which is inherited by subclasses. I can't work out how to do it without using some incredibly clunky code. There must be a better way.
In this toy example, my base class is called MyBaseClass, and my subclass is called MySubClass. Each can be constructed with a single numeric argument, or no arguments (in which case NaN is assumed). In the toy example my SubClass is trivial and doesn't extend the behavior of MyBaseClass in any way, but obviously in practice it would do more stuff.
I want to be able to call the constructor of each as follows:
obj = MyBaseClass; % default constructor of 'NaN-like' object
obj = MyBaseClass([]); % create an empty 0x0 array of type MyBaseClass
obj = MyBaseClass(1); % create a 1x1 array of MyBaseClass with value 1
obj = MyBaseClass([1 2; 3 4]) % create a 2x2 array of MyBaseClass with values 1, 2, 3, 4.
And the same four calls for MySubClass.
The solution I have found needs to call eval(class(obj)) in order to recover the subclass name and construct code in strings to call while in the base class constructor. This seems clunky and bad. (And it's somewhat surprising to me that it's possible, but it is.) I guess I could duplicate more logic between the MyBaseClass and MySubClass constructors, but that also seems clunky and bad, and misses the point of inheritance. Is there a better way?
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% MyBaseClass.m
classdef MyBaseClass
properties
data = NaN
end
methods
% constructor
function obj = MyBaseClass(varargin)
if nargin == 0
% Handle the no-argument case
return
end
arg = varargin{1};
% assume arg is a numeric array
if isempty(arg)
% Handle the case ClassName([])
% Can't write this, because of subclasses:
% obj = MyBaseClass.empty(size(arg));
obj = eval([class(obj) '.empty(size(arg))']);
return
end
% arg is an array
% Make obj an array of the correct size by allocating the nth
% element. Need to recurse for the no-argument case of the
% relevant class constructor, which might not be this one.
% Can't write this, because of subclasses
% obj(numel(arg)) = MyBaseClass;
obj(numel(arg)) = eval(class(obj));
% Rest of the constructor - obviously in this toy example,
% could be simplified.
wh = ~isnan(arg);
for i = find(wh(:))'
obj(i).data = arg(i);
end
% And reshape to the size of the original
obj = reshape(obj, size(arg));
end
end
end
% end of MyBaseClass.m
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% MySubClass.m
classdef MySubClass < MyBaseClass
methods
function obj = MySubClass(varargin)
obj = obj#MyBaseClass(varargin{:});
end
end
end
% end of MySubClass.m
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Your solution is functional and embraces some loose MATLAB typing to achieve what you want. However, getting clean and structured OOP is probably going to require losing some of the functionality you want. At the same time, the best option for avoiding code duplication is templated/generic container classes but these are not supported in MATLAB at this time.
Your code mirrors the MATLAB documentation on Building Arrays in the Constructor and relies on MATLAB being a loosely typed language that enabled you to convert an object into an array of objects without problem. Exploiting this powerful and flexible feature of MATLAB does introduce some organizational issues and may undermine your efforts at clean, object oriented code.
Problems begin because the MyBaseClass constructor is not a true constructor for MyBaseClass.
Wikipedia says:
"In object-oriented programming, a constructor (sometimes shortened to ctor) in a class is a special type of subroutine called at the creation of an object. It prepares the new object for use, often accepting parameters which the constructor uses to set any member variables required when the object is first created. It is called a constructor because it constructs the values of data members of the class."
Notice that the MyBaseClass constructor is not constructing values for the object members. Instead, it is a function that sets the object equal to an array of objects of type MyBaseClass and tries to set their data members to some value. You can see where the obj is destroyed at set to an array here:
obj(numel(arg)) = eval(class(obj));
This behavior is especially unhelpful when you derive MySubClass from MyBaseClass because MyBaseClass isn’t supposed to assign a new object to the variable obj----MySubClass has already created the new object in obj and is simply asking MyBaseClass to construct the portion of the existing object in obj that MyBaseClass knows the details for.
Some clarity might be gained by noting that when you enter the constructor for both MyBaseClass and MySubClass the variable obj is already populated with a perfectly good instance of the class. Good OOP practice would have you keep this original instance, use it in the base class constructor, and only act to populate its members in the constructor----not to overwrite the object entirely with something new.
My conclusion would be to not assign obj to be an array inside of MyBaseClass. Instead, I would recommend creating a class MyBaseClassArray that creates an array of MyBaseClass objects.
Unfortunately, you would also need to create a duplicate class MySubClassArray that creates an array of MySubClass objects. Languages like C++ and Java get around this code duplication issue with templates and generics, respectively but MATLAB does not currently support any form of templates (http://www.mathworks.com/help/techdoc/matlab_oop/brqzfut-1.html). Without templates there is no good way to avoid code duplication.
You could try and avoid some duplication by creating a generic CreateClassArray function that takes the string name of a class to create and the constructor arguments to use for each object---but now we are coming back to code that looks like your original. The only difference is now we have a clear division between the array class and the individual objects. The truth is that although MATLAB does not support templates, its flexible classes and typing system allow you use eval() like you have to change code and overwrite obj at will and create code that acts generically across classes. The cost? Readability, speed, and the uncomfortable feeling you got when you saw your base class constructing the subclass.
In short, you used MATLAB’s flexibility to overwrite the obj in the constructor with an array to avoid creating a separate container class for MyBaseClass. You then used eval to make up for not having a template feature in MATLAB that would allow you to reuse your array creation code all types. In the end, your solution is functional, reduces code duplication, but does require some unnatural behavior from your classes. It’s just a trade you have to make.

Best Practice: Abstract data via get/set or allow access to the variable?

I was writing some python classes the other day, and created a class like this:
class Option:
Name = ""
Description = ""
Criteria = {}
def __init__(self, name, descr):
self.Name = name
self.Description = descr
def addCriteria(self, cname, ctype):
...
In my program, I was accessing the data in the class directly, i.e.:
cname = opt.Name
While this was a conscious decision, I know a lot of classes would have abstracted this as:
cname = opt.getName()
Should I have taken the time to abstract all the data and not allow direct access? What triggers when you would add access methods for class data?
Mike
Even though this goes against anything you learn when doing OOP, I would suggest, that you defer the decision until it really turns out, that you need some "magic" to take place on attribut access. In Python, it is easy enough to refactor the class providing the attributes using the special property function:
class ItsMagic(object):
_name = None
def _get_name(self):
return self._name
def _set_name(self, new_name):
self._name = new_name
Name = property(_get_name, _set_name)
magic = ItsMagic()
magic.Name = "Foo"
This is transparent to the clients of this class, ie., they cannot usually tell (and should not care), whether the property reference is handled by providing access to the actual object attribute or by means of magic getter/setter functions, and it can be applied later (if it turns out to be required, that is).